Concurrency & parallelism
Concurrency vs parallelism, threads vs processes vs async I/O, and the hazards (races, deadlocks) plus locks/mutexes vs semaphores.
Concurrency and parallelism sound interchangeable and aren’t. One is about structure (dealing with many things at once); the other is about execution (doing many things at the same instant). The hazards — races and deadlocks — and the tools to prevent them follow directly from that distinction, and from one question: is your bottleneck the CPU or the wait?
The three models
The first decision is what’s slowing you down. A CPU-bound workload (hashing, image resizing, number-crunching) spends its time computing, so it wants parallelism — real cores doing real work simultaneously. An I/O-bound workload (database calls, HTTP requests, file reads) spends its time waiting on something else, so it wants concurrency — often async I/O, where one thread fires thousands of requests and reacts as each completes, never blocking on any single wait.
The execution units sit on a spectrum of sharing vs safety:
- Threads share the process’s memory. Coordinating them is cheap (just touch a shared variable), but that same shared memory is exactly what lets them corrupt each other’s state — the source of race conditions.
- Processes have isolated memory. Nothing one process does can scribble on another’s data (safe), but they’re heavier to spawn and must communicate over IPC (pipes, sockets, shared-memory segments).
- Async I/O sidesteps the question: a single thread handles thousands of concurrent waits because none of them block. This is the Node/Python-asyncio model — superb for I/O, useless for CPU work (a heavy computation blocks the one loop and stalls everything; see the event loop in 4.1).
The hazards and the tools
A race condition is when the result depends on the unlucky interleaving of two threads touching shared state. A deadlock is when threads wait on each other’s locks forever. Locks/mutexes and semaphores are the primitives that prevent the first — and, if used carelessly, cause the second. That double edge is the whole story: the same tool that closes the race can hang the program.
| Tool | Use when | Avoid when | Gotcha |
|---|---|---|---|
| Mutex / lock | exactly one thread may touch a resource at a time | the section is read-only or contention is nil | lock ordering bugs cause deadlock; keep critical sections tiny |
| Semaphore (counting) | cap concurrency at N (e.g. 5 DB connections) | you only need mutual exclusion (use a mutex) | forgetting to release leaks permits until exhaustion |
| Async I/O | many concurrent I/O-bound ops on few threads | CPU-bound work — it'll block the loop | a blocking call in an async path stalls everything |
| Worker pool / processes | CPU-bound parallel work across cores | lightweight I/O — overhead isn't worth it | shared-memory threads can race; isolated processes need IPC |
Two threads run balance = balance + 100 on a shared balance of 0:
T1: read balance (0)
T2: read balance (0) ← both read before either writes
T1: write 0 + 100 = 100
T2: write 0 + 100 = 100 ← T1's update is lost; result is 100, not 200
FIX — make read-modify-write atomic with a mutex:
lock(m)
balance = balance + 100
unlock(m)
// now T2 waits for the lock; final balance is 200.The bug lives in the gap between read and write — a window where another thread can sneak in with a stale value. The += 100 looks atomic in source code but compiles to read, add, store; the mutex collapses those three steps into one indivisible operation, so no other thread can observe the in-between state. (Hardware atomics like compare-and-swap solve the same problem lock-free for simple cases, but the mutex generalizes to any critical section.)
Deadlock: the four conditions and the cheap fix
Deadlock needs all four of Coffman’s conditions at once: mutual exclusion (locks are exclusive), hold-and-wait (you hold one lock while requesting another), no preemption (a lock can’t be forcibly taken), and a circular wait (a cycle in the wait-for graph, as above). Break any single one and deadlock is impossible.
01 Learning objectives
0 / 3 done02 Knowledge check
- 01medium
Concurrency vs parallelism:
- 02hard
A mutex differs from a counting semaphore in that:
03 Interview questions
browse all ↗What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.
-
Concurrency vs parallelism — what's the difference?
Concurrency is about *dealing with* many tasks by interleaving them — making progress on several by switching between them, even on a single core. Parallelism is *doing* many tasks at the same instant on multiple cores.
Rob Pike's line: concurrency is about structure, parallelism is about execution. A single-threaded async server is concurrent but not parallel; a CPU-bound job split across 8 cores is parallel. You can have concurrency without parallelism and vice versa.
Follow-ups they push on- Can you have parallelism without concurrency?
- Where does Node's event loop sit on this axis?
Red flag Using the terms interchangeably. Interleaving on one core is concurrency, not parallelism.
source: GeeksforGeeks — Concurrency vs parallelism ↗ -
Threads vs processes — what's shared, what's isolated, and when do you pick each?
A process has its own isolated memory space; threads within a process share the same heap/address space. Threads are cheaper to create and communicate through shared memory; processes are heavier but isolated — a crash or memory corruption in one process can't directly corrupt another.
Pick threads for fine-grained shared-memory work where communication cost matters; pick processes for isolation and fault containment (and, in languages with a GIL like CPython, to get true CPU parallelism). The tradeoff is shared-memory speed vs. isolation and safety.
Follow-ups they push on- How do processes communicate without shared memory? (IPC, pipes, sockets)
- Why does the GIL push CPython to multiprocessing for CPU-bound work?
Red flag Assuming threads always parallelize CPU work — a global interpreter lock (CPython) serializes bytecode, so threads help I/O but not CPU-bound loops.
source: GeeksforGeeks — Difference between process and thread ↗ -
What is a race condition? Show a classic example and how to fix it.
A race condition is when the result depends on the unpredictable timing/interleaving of concurrent operations on shared state. Classic case: two threads run
balance = balance + 100. That's read-modify-write: both read the same old value, both add 100, both write back — one update is lost.Fix by making the critical section atomic: guard it with a mutex/lock, use an atomic increment, or use a compare-and-swap. The general principle is to serialize access to shared mutable state so only one thread is in the critical section at a time.
Follow-ups they push on- Why isn't `x++` atomic?
- What's a check-then-act race (e.g. 'if not exists, create')?
Red flag Assuming a single statement like `count++` is atomic — it compiles to load/add/store, which can interleave.
source: GeeksforGeeks — Race condition ↗ -
Mutex vs semaphore — define each and when you'd use which.
A mutex provides mutual exclusion: one holder at a time, and ownership matters — the thread that locks it should unlock it. Use it to protect a single shared resource / critical section.
A semaphore is a counter that permits up to N concurrent holders (
acquiredecrements,releaseincrements, block at zero). A binary semaphore (N=1) resembles a lock but has no ownership and is often used for signaling between threads. Use a counting semaphore to cap concurrency — e.g. limit to 10 simultaneous DB connections.Follow-ups they push on- What does 'ownership' give a mutex that a semaphore lacks?
- How would you bound a connection pool with a semaphore?
Red flag Treating a binary semaphore as a drop-in mutex. Without ownership, any thread can release it, which permits subtle bugs a mutex prevents.
source: GeeksforGeeks — Mutex vs semaphore ↗ -
What is a deadlock, what four conditions cause it, and how do you prevent it?
A deadlock is when threads wait forever on each other's locks. It needs all four Coffman conditions simultaneously: mutual exclusion, hold-and-wait, no preemption, and circular wait.
Break any one to prevent it. The most practical: impose a global lock-ordering so all threads acquire locks in the same order (kills circular wait); or acquire all locks at once (kills hold-and-wait); or use lock timeouts /
tryLockand back off. Example deadlock: thread A holds lock 1 and wants lock 2 while thread B holds lock 2 and wants lock 1.Follow-ups they push on- Which Coffman condition is easiest to remove in practice? (circular wait via ordering)
- Deadlock vs livelock vs starvation?
Red flag Adding 'just one more lock' to fix a race and creating a deadlock instead. Inconsistent lock acquisition order is the usual culprit.
source: GeeksforGeeks — Deadlock and conditions ↗ -
Why does async I/O let a single thread handle thousands of connections?
Most server work is I/O-bound — waiting on the network, disk, or a database. With blocking I/O each connection ties up a thread that just sits idle during the wait, so 10k connections need ~10k threads (expensive memory + context switching).
Non-blocking async I/O flips this: the thread issues the I/O and immediately moves on; the OS notifies it (epoll/kqueue) when data is ready, and a callback/continuation resumes. One thread multiplexes thousands of in-flight waits because no thread blocks on the wait. The catch: a CPU-bound task blocks the loop and starves everyone, so async shines for I/O-bound, not CPU-bound, work.
Follow-ups they push on- When does async hurt? (CPU-bound work blocking the event loop)
- How is this different from a thread-per-request server?
Red flag Believing async is faster for everything. It wins on I/O concurrency; a heavy CPU computation still blocks the single event-loop thread.
source: MDN — Asynchronous JavaScript / event loop ↗ -
What is optimistic vs pessimistic locking, and when do you pick each?
Pessimistic locking assumes conflicts are likely, so it locks the row/resource up front (
SELECT ... FOR UPDATE) and others wait. Safe but reduces concurrency and risks deadlocks.Optimistic locking assumes conflicts are rare: read freely, and at write time check a version number (or timestamp/ETag) — if it changed, someone else won, so reject and retry. Great for low-contention, read-heavy workloads; wasteful retries under high contention. Pick pessimistic for hot, highly-contended rows; optimistic for mostly-independent updates.
Follow-ups they push on- How does a version column implement optimistic locking?
- How does this map to HTTP's If-Match / 412?
Red flag Using optimistic locking on a hotly-contended counter — you'll thrash on retries. High contention favors pessimistic locking.
source: Martin Fowler — Optimistic Offline Lock ↗ -
What is a deadlock vs a livelock vs starvation? Distinguish all three.
Deadlock: threads are blocked forever, each waiting on a resource another holds — nobody moves (e.g. the AB/BA lock-ordering cycle). Livelock: threads aren't blocked and keep *changing state* in response to each other, but make no progress — like two people stepping aside in the same direction repeatedly in a hallway. Starvation: a thread *can* run but is perpetually denied the resource because others keep winning it (e.g. a low-priority thread under a greedy scheduler).
Key distinction: deadlock = stuck and idle; livelock = busy but unproductive; starvation = some progress overall, but one thread is unfairly shut out.
What a strong answer coversDeadlock: mutual blocking, zero activity, circular wait.
Livelock: active state changes but no forward progress.
Starvation: a thread is runnable but perpetually denied the resource.
Fairness/aging fixes starvation; lock ordering fixes deadlock.
Quick self-checkTwo threads each detect a conflict, both back off and immediately retry in lockstep, repeating forever without blocking. This is:
-
No — deadlocked threads are blocked and idle, not actively retrying.
-
Correct — they keep changing state in response to each other but make no progress.
-
No — starvation is one thread unfairly denied, not symmetric busy-retrying.
-
No — a race is a timing-dependent correctness bug, not a no-progress loop.
Follow-ups they push on- How can a naive retry-on-conflict loop cause livelock?
- How does priority aging address starvation?
Red flag Calling any 'no progress' situation a deadlock. Livelock threads are actively running, and starvation still has overall progress — different causes, different fixes.
source: GeeksforGeeks — Deadlock, Starvation, and Livelock ↗ -
Why are atomic operations and compare-and-swap (CAS) faster than locks for simple shared counters?
A mutex involves OS-level machinery: contended threads may block and be parked/woken by the scheduler, which costs context switches. CAS is a single hardware instruction — 'if memory still holds the value I read, swap in the new value, else fail' — so a lock-free counter just loops
read → compute → CAS, retry on failureentirely in user space with no blocking.Under low-to-moderate contention this is much cheaper. The tradeoff: CAS works for small single-word updates; complex multi-variable invariants still need locks, and very high contention makes CAS retry loops spin wastefully. This is the basis of lock-free/atomic data structures.
What a strong answer coversLocks can block threads → context-switch and scheduler overhead.
CAS is one atomic CPU instruction; lock-free loops stay in user space.
Great for single-word updates (counters, flags); not for multi-variable invariants.
Under heavy contention, CAS retry loops can spin and waste CPU.
Follow-ups they push on- What is the ABA problem in CAS-based algorithms?
- When does a spinning CAS loop perform worse than a mutex?
Red flag Assuming lock-free is always faster. CAS shines for tiny updates under modest contention; complex invariants and extreme contention can favor locks.
source: GeeksforGeeks — Compare and Swap (CAS) ↗ -
Debugging: a Node.js endpoint that does heavy synchronous JSON crypto makes ALL other requests slow. Why, and how do you fix it?
Node runs your JavaScript on a single event-loop thread. A heavy *synchronous* CPU task (a big loop, sync crypto, JSON over megabytes) doesn't yield, so it blocks the event loop — every other pending request, timer, and callback stalls until it finishes. Async I/O isn't the issue; CPU-bound sync work is.
Fixes: move the CPU work off the loop — use a worker thread (or
worker_threads/a child process), the async variant of the crypto API, or offload to a separate service/queue. The rule: never run long synchronous CPU work on the event loop thread.What a strong answer coversNode's JS executes on one event-loop thread; sync CPU work blocks everything.
Async I/O is fine — the culprit is synchronous CPU-bound code.
Fix: worker threads / child process / async crypto / offload to a service.
Symptom: latency spikes across unrelated endpoints during the heavy call.
Quick self-checkA synchronous CPU-heavy handler slows all Node.js requests. The correct fix is to:
-
Wrong — await doesn't yield mid-CPU-loop; the loop still blocks.
-
Correct — that frees the event-loop thread to serve other requests.
-
Wrong — it hides nothing; other requests still queue behind the blocked loop.
-
Wrong — irrelevant; all routes share the one event loop.
Follow-ups they push on- Why doesn't adding more async/await help a CPU-bound loop?
- When would you reach for a separate service vs a worker thread?
Red flag Trying to fix it by sprinkling `async/await`. Awaiting doesn't yield during a synchronous CPU loop — you must move the computation off the event-loop thread.
source: Node.js — Don't block the event loop ↗ -
What is thread starvation in a connection/thread pool, and how does it cause a 'deadlock' without any locks?
Pool starvation: every thread (or DB connection) in a bounded pool is busy waiting on a resource that can only be supplied by *another* task that's now stuck in the pool's queue with no thread to run it. No mutex is involved, yet the system wedges — a 'pool-induced deadlock'.
Classic case: a request handler holds a pooled thread and synchronously calls back into the same service/pool, which is exhausted; the inner call waits for a thread that will only free up when the outer call returns. Fixes: never block a pooled thread waiting on the same pool, size pools to account for nested calls, separate pools for distinct workloads (bulkheading), and add timeouts so waiters fail fast instead of hanging forever.
What a strong answer coversAll pool threads/connections busy → queued work can't get a worker.
A task blocking on work that needs the same exhausted pool wedges the system.
It looks like a deadlock but has no locks — it's resource exhaustion.
Fixes: bulkhead separate pools, avoid nested same-pool blocking, add timeouts.
Follow-ups they push on- How does the bulkhead pattern prevent one workload from starving others?
- Why do checkout/borrow timeouts help even if they don't fix the root cause?
Red flag Blocking a pooled worker on a call that itself needs a worker from the same exhausted pool. Use separate pools and timeouts, and avoid nested same-pool blocking.
source: Microsoft — Bulkhead pattern ↗ -
Why is double-checked locking for lazy singleton initialization subtly broken without proper memory visibility?
Double-checked locking checks
instance == null, locks only if null, checks again inside the lock, then constructs. The subtle bug is memory visibility / instruction reordering: object construction isn't atomic — the reference can become visible to other threads *before* the constructor's writes are flushed, so a second thread sees a non-null but partially-initialized object.The fix depends on the memory model: in Java, mark the field
volatile(which since JMM 5 establishes the needed happens-before ordering); other languages need their equivalent memory barrier / acquire-release semantics. The deeper lesson: correctness under concurrency needs the language's memory model guarantees, not just mutual exclusion.What a strong answer coversThe flaw is reordering/visibility, not the locking logic itself.
A thread can publish the reference before the constructor's writes are visible.
Fix in Java:
volatilefield (post-Java-5 memory model).Lesson: concurrency correctness requires memory-model guarantees, not just locks.
Quick self-checkWhat makes naive double-checked locking unsafe?
-
Wrong — the lock is taken at most once, only when instance is null.
-
Correct — without volatile/barriers a thread can see a partially-initialized object.
-
Misleading — the issue is visibility/ordering, not the check itself.
-
Wrong — they can; the issue is doing it safely.
Follow-ups they push on- What does `volatile` guarantee that a plain field doesn't?
- Why is a static holder/initialization-on-demand idiom often cleaner than DCL?
Red flag Believing the second null-check alone makes DCL safe. Without volatile/memory barriers, reordering can expose a half-constructed instance.
source: Wikipedia — Double-checked locking ↗ -
Thread-per-request vs event-loop (reactive) servers — what's the tradeoff at high concurrency?
Thread-per-request (classic Java/Tomcat, Apache prefork) assigns each connection a thread. The model is simple — blocking code reads top-to-bottom — but each thread costs ~1MB+ of stack and context-switch overhead, so tens of thousands of concurrent connections exhaust memory and the scheduler (the C10k problem).
Event-loop / reactive servers (Node, Netty, nginx) handle many connections on a few threads via non-blocking I/O and callbacks, scaling to huge connection counts with low memory. The cost is programming complexity (callbacks/async) and the danger that any blocking call freezes the loop. Threads suit moderate concurrency with blocking dependencies; event loops suit massive I/O-bound concurrency.
What a strong answer coversThread-per-request: simple blocking code, but per-thread memory + context switches cap concurrency.
Event loop: few threads, non-blocking I/O, scales to huge connection counts.
This is the classic C10k scaling story.
Event loops demand non-blocking code; one blocking call stalls everyone.
Follow-ups they push on- What is the C10k problem?
- How do virtual/green threads (e.g. Java loom, goroutines) blur this divide?
Red flag Assuming 'more threads = more scale'. Past a point, thread memory and context-switching dominate; that's exactly what event-loop models were built to avoid.
source: GeeksforGeeks — Thread per request vs event-driven model ↗