Streams & buffers
The four stream types, backpressure (why pipe/pipeline handle it), buffers for binary data, and the must-handle 'error' event.
Streams let you process data in chunks as it arrives instead of buffering the whole thing in memory. That’s how Node copies a 10GB file, or proxies a video, in a few kilobytes of RAM — and it’s why every Node HTTP request and response is a stream.
The four stream types
| Type | Direction | Real example |
|---|---|---|
| Readable | you read from it | fs.createReadStream, an incoming HTTP request, process.stdin |
| Writable | you write to it | fs.createWriteStream, an HTTP response, process.stdout |
| Duplex | read and write, independent channels | a TCP net.Socket (you send and receive separately) |
| Transform | a Duplex where output is a function of input | zlib.createGzip(), a cipher, a CSV-to-JSON parser |
The mnemonic: Readable is a source, Writable is a sink, Duplex is both ends of an independent pipe, and Transform is a Duplex that modifies the bytes flowing through it (compress, encrypt, parse). gzip is the textbook Transform — bytes in, smaller bytes out.
Buffers: the bytes underneath
Before streams make sense, know what a chunk is. A Buffer is a fixed-length sequence of raw
bytes living outside the V8 heap. Strings in JS are UTF-16 and immutable; Buffer is how Node
handles arbitrary binary — file contents, network packets, image data.
const buf = Buffer.from("héllo", "utf8"); // bytes, not characters
buf.length; // 6 — "é" is two UTF-8 bytes, so length ≠ string length
buf.toString("utf8"); // "héllo" back again
buf.toString("hex"); // "68c3a96c6c6f"
Backpressure — the whole point of streams
Imagine reading a fast SSD and writing to a slow network. The readable produces chunks faster than the writable can flush them. Without coordination, the unflushed chunks pile up in memory until the process is killed. Backpressure is the protocol that prevents this.
The low-level signal lives in writable.write():
write(chunk)returnstrue→ the internal buffer is below its high-water mark; keep going.write(chunk)returnsfalse→ the buffer is full; stop writing and wait.- the writable later emits a
'drain'event → the buffer has emptied; safe to resume.
Honoring this by hand is tedious and error-prone:
// Manual backpressure — correct but fiddly
function copy(readable, writable) {
readable.on("data", (chunk) => {
const ok = writable.write(chunk);
if (!ok) {
readable.pause(); // stop reading until drained
writable.once("drain", () => readable.resume());
}
});
readable.on("end", () => writable.end());
}
const r = fs.createReadStream("big.bin"); // fast
const w = fs.createWriteStream("out.bin"); // slow
r.on("data", (chunk) => {
const ok = w.write(chunk);
console.log("wrote chunk, ok =", ok);
if (!ok) { r.pause(); console.log("paused"); }
});
w.on("drain", () => { r.resume(); console.log("drained, resumed"); });
r.on("end", () => { w.end(); console.log("end"); });Representative output (the source outpaces the sink):
wrote chunk, ok = true
wrote chunk, ok = true
wrote chunk, ok = false
paused
drained, resumed
wrote chunk, ok = true
...
endThe first writes return true while the sink’s buffer is below its high-water mark. Once it fills,
write() returns false, we pause(), and no more 'data' events fire until the sink emits
'drain'. We resume(), more chunks flow, and the cycle repeats until 'end'. That false → pause → drain → resume loop is precisely what pipeline() does for you — getting it wrong by hand
(ignoring the false) is how a copy loop buffers the whole file in memory.
Let pipeline do it for you
pipe() and especially pipeline() handle backpressure automatically — pausing the source when
the destination is full and resuming on drain. Prefer pipeline() because it also propagates
errors and cleans up every stream (closing file descriptors) when any one fails.
const { pipeline } = require("node:stream/promises");
const fs = require("node:fs");
const zlib = require("node:zlib");
async function gzipFile(src, dest) {
await pipeline(
fs.createReadStream(src), // Readable — source
zlib.createGzip(), // Transform — compress
fs.createWriteStream(dest), // Writable — sink
);
// Resolves only when fully flushed. On any error it rejects AND
// destroys all three streams, closing the file descriptors for you.
}A 10GB file flows through this in chunks; peak memory is a few buffer-sized chunks, not 10GB. That constant memory footprint — regardless of input size — is the headline benefit of streaming.
01 Learning objectives
0 / 4 done02 Curated reading
03 Knowledge check
- 01easy
Streaming a 10GB file's main benefit is:
- 02medium
Backpressure is signalled when:
04 Interview questions
browse all ↗What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.
-
Name the four stream types in Node and give a concrete example of each.
- Readable — you read data out of it. Example:
fs.createReadStream(file), an incoming HTTP request (req).
- Writable — you write data into it. Example:fs.createWriteStream(file), an HTTP response (res),process.stdout.
- Duplex — readable *and* writable, two independent channels. Example: a TCP socket (net.Socket).
- Transform — a Duplex where the output is a function of the input. Example:zlib.createGzip(), a crypto cipher, or a custom parser.The value of streams: process data in chunks as it arrives instead of buffering the whole payload in memory.
Follow-ups they push on- How is a Transform stream different from a plain Duplex?
- Which stream type is an HTTP request, and which is the response?
Red flag Saying Duplex and Transform are the same — Transform's output is derived from its input; a Duplex's two sides are unrelated.
source: Node.js docs — How to use streams ↗ -
What is backpressure? What does it mean when stream.write() returns false, and what is the 'drain' event for?
Backpressure is the feedback that a fast producer is outpacing a slow consumer. Each writable stream has an internal buffer with a
highWaterMark. Whenwrite()pushes the buffer past that threshold, it returnsfalse— a signal saying "stop writing, I'm full."If you ignore it and keep writing, the buffer grows unbounded and memory balloons. The correct response: pause the source and wait for the
drainevent, which fires once the buffer has emptied below the mark, then resume.You rarely wire this by hand —
pipe()andpipeline()implement the pause/resume dance for you, which is exactly why they are preferred.Follow-ups they push on- How does pipe() handle backpressure automatically?
- What is highWaterMark and what happens if you set it very high?
Red flag Writing in a loop while ignoring write()'s return value — unbounded memory growth under load.
source: Node.js docs — Stream backpressuring ↗ -
You must read a 10GB file, transform each line, and write the result — on a box with 512MB RAM. How?
Stream it; never load the whole file. Build a pipeline of a Readable → Transform → Writable so only small chunks are in memory at any moment, with backpressure keeping the buffers bounded:
``
`
import { pipeline } from "node:stream/promises";
await pipeline(
fs.createReadStream("in"),
someLineTransform,
fs.createWriteStream("out")
);pipeline
wires backpressure (the read pauses when the write is slow) and — crucially — propagates errors and cleans up every stream (destroying them) if any stage fails. Memory stays ~highWaterMark-sized, independent of the 10GB total.fs.readFile` would try to allocate 10GB and crash.Follow-ups they push on- Why prefer pipeline() over chaining .pipe()? (Error handling + cleanup.)
- How would you split the stream into lines before the transform?
Red flag Reaching for fs.readFile / reading into one big Buffer — it cannot fit and OOMs the process.
source: Node.js docs — stream.pipeline ↗ -
Why is pipeline() preferred over chaining .pipe()? What does each do about errors?
a.pipe(b).pipe(c)handles backpressure but not errors: ifbemitserror,pipedoes not forward it or destroy the other streams. You are left with un-destroyed streams (leaked file descriptors/sockets) and an unhandlederrorevent — which crashes the process if no listener exists.stream.pipeline(a, b, c, cb)(or the promise formnode:stream/promises) wires the same backpressure and: forwards the first error to the callback/rejection, and destroys every stream in the chain on completion or failure. That cleanup is the whole reason to prefer it.Rule of thumb: use
pipelinefor anything with real error/cleanup needs; bare.pipeonly for trivial throwaway cases.Follow-ups they push on- What resource leaks when a .pipe chain errors mid-way?
- What does the promise version of pipeline let you do with async/await?
Red flag Using long .pipe chains in production and assuming an error anywhere is handled — it is not.
source: Node.js docs — stream.pipeline ↗ -
What is a Buffer, and why does Node need it when JavaScript already has strings and arrays?
A Buffer is a fixed-length chunk of raw binary memory outside the V8 heap — Node's way of handling bytes (files, TCP packets, images, crypto) that pre-date
TypedArrayin the language. It is a subclass ofUint8Array.JavaScript strings are UTF-16 text, not bytes; a regular array is boxed and heap-heavy. Binary protocols, file contents, and network frames are sequences of bytes — Buffer gives you direct, efficient access to them and lets you control the encoding when converting to/from strings (
buf.toString("utf8"),Buffer.from(str, "base64")).Gotcha: a multi-byte UTF-8 character can be split across two chunks; decode with
StringDecoderor accumulate beforetoString.Follow-ups they push on- What goes wrong if you call buf.toString() on a chunk that splits a multi-byte character?
- Why is Buffer allocated off the V8 heap?
Red flag Treating chunk boundaries as character boundaries — concatenating decoded chunks can corrupt multi-byte UTF-8.
source: Node.js docs — Buffer ↗ -
This streaming code occasionally crashes the whole server with no stack trace pointing at user code. What's the most likely cause?
An unhandled
'error'event on a stream. Streams areEventEmitters, andEventEmitterhas a special rule: if an'error'event is emitted and there is no'error'listener, Node *throws* — crashing the process.With streams this is easy to hit: a read fails (file gone, socket reset), the source emits
error, nothing is listening, and the server dies. The fix is to handleerroron every stream, or — better — usepipeline(), which routes errors to one place and destroys the streams.``
``
rs.on("error", handle); // not optionalFollow-ups they push on- Why does an EventEmitter throw specifically on an unhandled 'error' event?
- How does pipeline() remove the need to attach error handlers to each stream?
Red flag Handling 'data'/'end' but forgetting 'error' — the one event whose absence crashes the process.
source: Node.js docs — Error handling with streams ↗ -
What are the two reading modes of a Readable stream (flowing vs paused), and how do you switch between them?
A Readable stream is in one of two modes:
- Paused (pull) — you explicitly call
read()to pull chunks. This is the default for a freshly created stream.
- Flowing (push) — chunks are pushed at you as fast as they arrive via'data'events.It switches to flowing when you attach a
'data'listener, callresume(), orpipe()it. It goes back to paused withpause()or by removing the'data'listener (andunpipe).The practical takeaway: attaching a
'data'handler starts the firehose immediately — if your consumer is slow you must respect backpressure (or just usepipe/pipeline, which manages the mode for you).Follow-ups they push on- What starts a stream flowing the moment you attach a 'data' listener?
- Which mode does pipe() put the source in?
Red flag Adding a 'data' listener and assuming the stream waits for you — it starts pushing chunks immediately.
source: Node.js docs — Two reading modes ↗ -
What is highWaterMark on a stream, and what actually happens if you set it very high vs very low?
highWaterMarkis the buffer threshold that drives backpressure. For a Writable it's the byte (or object) count at whichwrite()starts returningfalse; for a Readable it's how much data the stream buffers ahead via internalread()calls. Default is 64 KB for byte streams (16 objects in object mode).- Set it very high: the stream buffers a lot before signaling backpressure, so more data sits in memory. You get fewer pause/resume cycles (possibly slightly higher throughput) at the cost of a bigger memory footprint — and a huge value can defeat the point of streaming.
- Set it very low: backpressure kicks in almost immediately, memory stays tiny, but you pay more overhead in frequent pause/resume andreadcalls, hurting throughput.It's a memory-vs-throughput knob; the 64 KB default is a sensible balance for most workloads.
What a strong answer coversThe buffer threshold that triggers backpressure (write() → false; readable buffers ahead).
Default 64 KB for byte streams, 16 for object mode.
Higher → more in-memory buffering, fewer pause/resume cycles, bigger footprint.
Lower → tighter memory, more overhead from frequent backpressure signaling.
Follow-ups they push on- How does highWaterMark interact with the drain event?
- Why might a very high highWaterMark partially defeat the purpose of streaming?
Red flag Cranking highWaterMark up to 'go faster' — it just buffers more in memory and can reintroduce OOM risk.
source: Node.js docs — Buffering / highWaterMark ↗ -
What's the difference between Buffer.alloc(n) and Buffer.allocUnsafe(n), and why does the 'unsafe' one exist?
Buffer.alloc(n)allocatesnbytes and zero-fills them — safe, predictable, but it pays the cost of writing zeros across the whole buffer.Buffer.allocUnsafe(n)allocatesnbytes without initializing them, so the memory may contain leftover bytes from previously freed allocations — potentially old data (passwords, keys, other requests). It's faster precisely because it skips the zero-fill.The 'unsafe' version exists for hot paths where you're about to fully overwrite the buffer immediately (e.g. you
copy/fillinto allnbytes before reading). The danger is forgetting to overwrite some region and then sending/logging it — leaking stale memory. Default toBuffer.alloc; reach forallocUnsafeonly when you'll write every byte before reading and have measured a real win.Never use the deprecated
new Buffer(n)constructor — it's unsafe and removed/forbidden.What a strong answer coversalloczero-fills (safe);allocUnsafeskips initialization (faster, may expose old memory).allocUnsafe may contain sensitive leftover bytes from freed allocations.
Only safe when you fully overwrite every byte before any read.
Avoid the deprecated
new Buffer()constructor entirely.
Quick self-checkWhich statement about Buffer.allocUnsafe(n) is correct?
-
Wrong: the speed comes precisely from NOT zero-filling.
-
Correct — it skips initialization, so old heap contents can remain.
-
Wrong: that's not its defining behavior; the key point is uninitialized memory.
-
Wrong: Buffers are off-heap regardless of which allocator you use.
Follow-ups they push on- What real security bug can leak from sending an under-written allocUnsafe buffer?
- Why was the old `new Buffer(n)` constructor deprecated?
Red flag Using allocUnsafe and not overwriting every byte — you can leak stale heap memory into output.
source: Node.js docs — Buffer.allocUnsafe ↗ -
Sketch a custom Transform stream that uppercases text. What are the _transform and _flush methods for?
Subclass
Transform(or pass atransformoption) and implement_transform(chunk, encoding, callback): process each incoming chunk,pushany output, and callcallback()to signal you're ready for the next chunk (orcallback(err)to error the stream).``
`
import { Transform } from "node:stream";
const upper = new Transform({
transform(chunk, _enc, cb) {
this.push(chunk.toString().toUpperCase());
cb();
},
});_flush(callback)
is optional and runs once, after the last chunk but before the stream ends — use it to emit any buffered/trailing data (e.g. the final piece of a line-splitter that has a partial line left over). _transformis per-chunk;_flush` is the one-time finalizer.What a strong answer covers_transform(chunk, enc, cb)runs per chunk: process,this.push(...), thencb().Call
cb(err)to propagate errors; calling cb signals readiness for the next chunk (backpressure-aware)._flush(cb)runs once after the last chunk to emit any buffered/trailing output.Pass
{ transform, flush }options or subclass — both work.
Follow-ups they push on- When is _flush essential? (Buffered/partial data like the last incomplete line.)
- How does calling the callback relate to backpressure on the readable side?
Red flag Forgetting to call the _transform callback — the stream stalls because it never asks for the next chunk.
source: Node.js docs — Implementing a Transform stream ↗ -
An Express handler does `fs.readFile(bigFile, (e, data) => res.send(data))` and the server OOMs under load. What's the streaming fix?
fs.readFilebuffers the entire file into memory before sending. Under concurrency, N simultaneous requests for a big file means N full copies in RAM at once — the heap balloons and the process OOMs.The fix is to stream the file straight to the response, so only small chunks are in memory and backpressure throttles reads to the client's download speed:
``
`
import { pipeline } from "node:stream/promises";
await pipeline(fs.createReadStream(bigFile), res);pipeline
wires backpressure (a slow client pauses the file read) and cleans up/propagates errors. Memory stays ~highWaterMark-sized per request, independent of file size. (Frameworks expose this asres.sendFile/reply.send(stream)`, which stream under the hood.)What a strong answer coversfs.readFileloads the whole file into RAM; N concurrent requests = N full copies → OOM.Streaming sends chunks, so per-request memory ≈ highWaterMark regardless of file size.
Backpressure throttles disk reads to the client's download rate.
Use
pipeline(createReadStream, res)(orres.sendFile) for error handling + cleanup.
Follow-ups they push on- Why does pipeline matter here over a bare .pipe to res?
- What does a slow client do to a streamed response vs a buffered one?
Red flag Buffering whole files with readFile in a request handler — fine in dev, OOMs under concurrent load.
source: Node.js docs — How to use streams ↗ -
What does stream.finished() / the 'end' vs 'finish' vs 'close' events tell you, and which fires for readable vs writable?
Three lifecycle events that interviewers conflate:
-
'end'— fires on a Readable when there's no more data to read (the source is exhausted).
-'finish'— fires on a Writable afterend()is called and all data has been flushed to the underlying system.
-'close'— fires when the stream and its resources (file descriptor, socket) are destroyed/closed; it's the cleanup signal, on both kinds.Because getting these right by hand is error-prone,
stream.finished(stream, cb)(and its promise form) gives you one callback that resolves when a stream is no longer readable/writable or errors — abstracting over end/finish/close/error. It's the robust way to know "this stream is truly done."What a strong answer covers'end'→ Readable exhausted (no more data to read).'finish'→ Writable flushed everything afterend().'close'→ underlying resource destroyed; cleanup signal on either side.stream.finished()unifies end/finish/close/error into one done-or-failed callback.
Follow-ups they push on- Why might 'finish' fire but 'close' not, or vice versa?
- How is stream.finished safer than listening for 'end' yourself?
Red flag Listening for 'end' on a Writable (it never fires there) or 'finish' on a Readable — wrong event for the side.
source: Node.js docs — stream.finished() ↗ -
What are object-mode streams, and async iteration over a stream (for await...of)? When would you use each?
Object mode (
{ objectMode: true }) lets a stream's chunks be arbitrary JS values (objects, numbers) instead of Buffers/strings. Useful for pipelines of parsed records — e.g. a CSV row parser emitting objects into a Transform that validates them. In object modehighWaterMarkcounts objects, not bytes (default 16).Async iteration: a Readable is async-iterable, so you can consume it with
for await...of:``
`
for await (const chunk of fs.createReadStream(file)) {
process(chunk);
}This reads chunks one at a time with built-in backpressure (the loop body's await pauses reading) and lets you use ordinary try/catch
for errors — far more readable than wiring'data'/'end'/'error'` by hand. Use it whenever you'd otherwise write event-handler boilerplate to consume a stream sequentially.What a strong answer coversObject mode: chunks are arbitrary JS values, not Buffers/strings; highWaterMark counts objects (default 16).
Readables are async-iterable:
for await...ofconsumes chunk-by-chunk.Async iteration has built-in backpressure and lets try/catch handle errors.
Use object mode for record pipelines; async iteration to avoid 'data'/'end'/'error' boilerplate.
Follow-ups they push on- How does for await...of provide backpressure automatically?
- What happens to the stream if you break out of the for await loop early?
Red flag Assuming chunks are always Buffers — in object mode they're whatever you pushed, and toString() would mangle them.
source: Node.js docs — Consuming readable streams with async iterators ↗ -
Why can `chunk.toString()` on each stream chunk corrupt text, and how do you decode multi-byte data safely?
Stream chunks split at arbitrary byte boundaries, not character boundaries. A multi-byte UTF-8 character (emoji, accented letters, CJK) can land with its first byte at the end of one chunk and the rest at the start of the next. Calling
chunk.toString("utf8")on each chunk independently then decodes a partial character — producing the replacement char `or mojibake — and you can't fix it by concatenating the broken strings afterward.Safe options:
- Use string_decoder.StringDecoder, which buffers incomplete multi-byte sequences across chunks and only emits complete characters.
- Or set the stream's encoding with setEncoding("utf8")(which uses StringDecoder internally) so 'data'yields decoded strings.Buffer.concat(...).toString()` once at the end (fine for small data, not for huge streams).
- Or accumulate the raw Buffers andWhat a strong answer coversChunks break on byte boundaries; a multi-byte char can straddle two chunks.
chunk.toString()per chunk decodes partial characters → garbled output you can't repair by concatenation.Use
StringDecoder(buffers incomplete sequences) orstream.setEncoding('utf8').Alternatively
Buffer.concatall chunks and decode once — only for small payloads.
Follow-ups they push on- Why can't you just concatenate the per-chunk decoded strings to fix it?
- When is Buffer.concat-then-decode acceptable vs StringDecoder?
Red flag Decoding each chunk with toString() independently — multi-byte characters spanning chunk boundaries corrupt.
source: Node.js docs — StringDecoder ↗