> cs·fundamentals
interview 0% 24m read
4.3 [J][A] 14 interview Q's

Streams & buffers

The four stream types, backpressure (why pipe/pipeline handle it), buffers for binary data, and the must-handle 'error' event.

Streams let you process data in chunks as it arrives instead of buffering the whole thing in memory. That’s how Node copies a 10GB file, or proxies a video, in a few kilobytes of RAM — and it’s why every Node HTTP request and response is a stream.

The four stream types

TypeDirectionReal example
Readableyou read from itfs.createReadStream, an incoming HTTP request, process.stdin
Writableyou write to itfs.createWriteStream, an HTTP response, process.stdout
Duplexread and write, independent channelsa TCP net.Socket (you send and receive separately)
Transforma Duplex where output is a function of inputzlib.createGzip(), a cipher, a CSV-to-JSON parser
A Transform is a Duplex whose write side feeds its read side.

The mnemonic: Readable is a source, Writable is a sink, Duplex is both ends of an independent pipe, and Transform is a Duplex that modifies the bytes flowing through it (compress, encrypt, parse). gzip is the textbook Transform — bytes in, smaller bytes out.

Buffers: the bytes underneath

Before streams make sense, know what a chunk is. A Buffer is a fixed-length sequence of raw bytes living outside the V8 heap. Strings in JS are UTF-16 and immutable; Buffer is how Node handles arbitrary binary — file contents, network packets, image data.

const buf = Buffer.from("héllo", "utf8");   // bytes, not characters
buf.length;            // 6 — "é" is two UTF-8 bytes, so length ≠ string length
buf.toString("utf8");  // "héllo" back again
buf.toString("hex");   // "68c3a96c6c6f"

Backpressure — the whole point of streams

Imagine reading a fast SSD and writing to a slow network. The readable produces chunks faster than the writable can flush them. Without coordination, the unflushed chunks pile up in memory until the process is killed. Backpressure is the protocol that prevents this.

The low-level signal lives in writable.write():

  • write(chunk) returns true → the internal buffer is below its high-water mark; keep going.
  • write(chunk) returns false → the buffer is full; stop writing and wait.
  • the writable later emits a 'drain' event → the buffer has emptied; safe to resume.

Honoring this by hand is tedious and error-prone:

// Manual backpressure — correct but fiddly
function copy(readable, writable) {
  readable.on("data", (chunk) => {
    const ok = writable.write(chunk);
    if (!ok) {
      readable.pause();                       // stop reading until drained
      writable.once("drain", () => readable.resume());
    }
  });
  readable.on("end", () => writable.end());
}
Readablefast sourceTransformgzipWritableslow sinkchunkchunkinternal buffer (high-water mark)FULLwrite() = false → pause source…flush → ‘drain’ → resume
FIG 1 · backpressure A pipeline with a fast source and a slow sink. Chunks flow left to right through a Transform. Each Writable has a bounded internal buffer (the high-water mark, default 16KB). When that buffer fills, write() returns FALSE — the backpressure signal — and the source pauses. Once the sink flushes and the buffer empties, it emits 'drain' and the source resumes. pipeline() wires this feedback loop for you, so peak memory stays bounded no matter how big the input is.
Predict the event order: pause/resume under backpressure
const r = fs.createReadStream("big.bin");   // fast
const w = fs.createWriteStream("out.bin");  // slow

r.on("data", (chunk) => {
  const ok = w.write(chunk);
  console.log("wrote chunk, ok =", ok);
  if (!ok) { r.pause(); console.log("paused"); }
});
w.on("drain", () => { r.resume(); console.log("drained, resumed"); });
r.on("end", () => { w.end(); console.log("end"); });

Representative output (the source outpaces the sink):

wrote chunk, ok = true
wrote chunk, ok = true
wrote chunk, ok = false
paused
drained, resumed
wrote chunk, ok = true
...
end

The first writes return true while the sink’s buffer is below its high-water mark. Once it fills, write() returns false, we pause(), and no more 'data' events fire until the sink emits 'drain'. We resume(), more chunks flow, and the cycle repeats until 'end'. That false → pause → drain → resume loop is precisely what pipeline() does for you — getting it wrong by hand (ignoring the false) is how a copy loop buffers the whole file in memory.

Let pipeline do it for you

pipe() and especially pipeline() handle backpressure automatically — pausing the source when the destination is full and resuming on drain. Prefer pipeline() because it also propagates errors and cleans up every stream (closing file descriptors) when any one fails.

Gzip a file with proper error handling and cleanup
const { pipeline } = require("node:stream/promises");
const fs = require("node:fs");
const zlib = require("node:zlib");

async function gzipFile(src, dest) {
  await pipeline(
    fs.createReadStream(src),     // Readable  — source
    zlib.createGzip(),            // Transform — compress
    fs.createWriteStream(dest),   // Writable  — sink
  );
  // Resolves only when fully flushed. On any error it rejects AND
  // destroys all three streams, closing the file descriptors for you.
}

A 10GB file flows through this in chunks; peak memory is a few buffer-sized chunks, not 10GB. That constant memory footprint — regardless of input size — is the headline benefit of streaming.

01 Learning objectives

0 / 4 done

02 Curated reading

03 Knowledge check

knowledge check2 questions · pass ≥ 70%
  1. 01easy

    Streaming a 10GB file's main benefit is:

  2. 02medium

    Backpressure is signalled when:

04 Interview questions

browse all ↗

What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.

  • Commonly asked mid concept common Name the four stream types in Node and give a concrete example of each.

    - Readable — you read data out of it. Example: fs.createReadStream(file), an incoming HTTP request (req).
    - Writable — you write data into it. Example: fs.createWriteStream(file), an HTTP response (res), process.stdout.
    - Duplex — readable *and* writable, two independent channels. Example: a TCP socket (net.Socket).
    - Transform — a Duplex where the output is a function of the input. Example: zlib.createGzip(), a crypto cipher, or a custom parser.

    The value of streams: process data in chunks as it arrives instead of buffering the whole payload in memory.

    Red flag Saying Duplex and Transform are the same — Transform's output is derived from its input; a Duplex's two sides are unrelated.

    source: Node.js docs — How to use streams ↗
  • Commonly asked senior concept common What is backpressure? What does it mean when stream.write() returns false, and what is the 'drain' event for?

    Backpressure is the feedback that a fast producer is outpacing a slow consumer. Each writable stream has an internal buffer with a highWaterMark. When write() pushes the buffer past that threshold, it returns false — a signal saying "stop writing, I'm full."

    If you ignore it and keep writing, the buffer grows unbounded and memory balloons. The correct response: pause the source and wait for the drain event, which fires once the buffer has emptied below the mark, then resume.

    You rarely wire this by hand — pipe() and pipeline() implement the pause/resume dance for you, which is exactly why they are preferred.

    Red flag Writing in a loop while ignoring write()'s return value — unbounded memory growth under load.

    source: Node.js docs — Stream backpressuring ↗
  • Commonly asked mid design common You must read a 10GB file, transform each line, and write the result — on a box with 512MB RAM. How?

    Stream it; never load the whole file. Build a pipeline of a Readable → Transform → Writable so only small chunks are in memory at any moment, with backpressure keeping the buffers bounded:

    ``
    import { pipeline } from "node:stream/promises";
    await pipeline(
    fs.createReadStream("in"),
    someLineTransform,
    fs.createWriteStream("out")
    );
    `

    pipeline wires backpressure (the read pauses when the write is slow) and — crucially — propagates errors and cleans up every stream (destroying them) if any stage fails. Memory stays ~highWaterMark-sized, independent of the 10GB total. fs.readFile` would try to allocate 10GB and crash.

    Red flag Reaching for fs.readFile / reading into one big Buffer — it cannot fit and OOMs the process.

    source: Node.js docs — stream.pipeline ↗
  • Commonly asked senior concept occasional Why is pipeline() preferred over chaining .pipe()? What does each do about errors?

    a.pipe(b).pipe(c) handles backpressure but not errors: if b emits error, pipe does not forward it or destroy the other streams. You are left with un-destroyed streams (leaked file descriptors/sockets) and an unhandled error event — which crashes the process if no listener exists.

    stream.pipeline(a, b, c, cb) (or the promise form node:stream/promises) wires the same backpressure and: forwards the first error to the callback/rejection, and destroys every stream in the chain on completion or failure. That cleanup is the whole reason to prefer it.

    Rule of thumb: use pipeline for anything with real error/cleanup needs; bare .pipe only for trivial throwaway cases.

    Red flag Using long .pipe chains in production and assuming an error anywhere is handled — it is not.

    source: Node.js docs — stream.pipeline ↗
  • Commonly asked mid concept occasional What is a Buffer, and why does Node need it when JavaScript already has strings and arrays?

    A Buffer is a fixed-length chunk of raw binary memory outside the V8 heap — Node's way of handling bytes (files, TCP packets, images, crypto) that pre-date TypedArray in the language. It is a subclass of Uint8Array.

    JavaScript strings are UTF-16 text, not bytes; a regular array is boxed and heap-heavy. Binary protocols, file contents, and network frames are sequences of bytes — Buffer gives you direct, efficient access to them and lets you control the encoding when converting to/from strings (buf.toString("utf8"), Buffer.from(str, "base64")).

    Gotcha: a multi-byte UTF-8 character can be split across two chunks; decode with StringDecoder or accumulate before toString.

    Red flag Treating chunk boundaries as character boundaries — concatenating decoded chunks can corrupt multi-byte UTF-8.

    source: Node.js docs — Buffer ↗
  • Commonly asked mid debug common This streaming code occasionally crashes the whole server with no stack trace pointing at user code. What's the most likely cause?

    An unhandled 'error' event on a stream. Streams are EventEmitters, and EventEmitter has a special rule: if an 'error' event is emitted and there is no 'error' listener, Node *throws* — crashing the process.

    With streams this is easy to hit: a read fails (file gone, socket reset), the source emits error, nothing is listening, and the server dies. The fix is to handle error on every stream, or — better — use pipeline(), which routes errors to one place and destroys the streams.

    ``
    rs.on("error", handle); // not optional
    ``

    Red flag Handling 'data'/'end' but forgetting 'error' — the one event whose absence crashes the process.

    source: Node.js docs — Error handling with streams ↗
  • Commonly asked senior concept occasional What are the two reading modes of a Readable stream (flowing vs paused), and how do you switch between them?

    A Readable stream is in one of two modes:

    - Paused (pull) — you explicitly call read() to pull chunks. This is the default for a freshly created stream.
    - Flowing (push) — chunks are pushed at you as fast as they arrive via 'data' events.

    It switches to flowing when you attach a 'data' listener, call resume(), or pipe() it. It goes back to paused with pause() or by removing the 'data' listener (and unpipe).

    The practical takeaway: attaching a 'data' handler starts the firehose immediately — if your consumer is slow you must respect backpressure (or just use pipe/pipeline, which manages the mode for you).

    Red flag Adding a 'data' listener and assuming the stream waits for you — it starts pushing chunks immediately.

    source: Node.js docs — Two reading modes ↗
  • Commonly asked senior concept occasional What is highWaterMark on a stream, and what actually happens if you set it very high vs very low?

    highWaterMark is the buffer threshold that drives backpressure. For a Writable it's the byte (or object) count at which write() starts returning false; for a Readable it's how much data the stream buffers ahead via internal read() calls. Default is 64 KB for byte streams (16 objects in object mode).

    - Set it very high: the stream buffers a lot before signaling backpressure, so more data sits in memory. You get fewer pause/resume cycles (possibly slightly higher throughput) at the cost of a bigger memory footprint — and a huge value can defeat the point of streaming.
    - Set it very low: backpressure kicks in almost immediately, memory stays tiny, but you pay more overhead in frequent pause/resume and read calls, hurting throughput.

    It's a memory-vs-throughput knob; the 64 KB default is a sensible balance for most workloads.

    What a strong answer covers
    • The buffer threshold that triggers backpressure (write() → false; readable buffers ahead).

    • Default 64 KB for byte streams, 16 for object mode.

    • Higher → more in-memory buffering, fewer pause/resume cycles, bigger footprint.

    • Lower → tighter memory, more overhead from frequent backpressure signaling.

    Red flag Cranking highWaterMark up to 'go faster' — it just buffers more in memory and can reintroduce OOM risk.

    source: Node.js docs — Buffering / highWaterMark ↗
  • Commonly asked mid concept occasional What's the difference between Buffer.alloc(n) and Buffer.allocUnsafe(n), and why does the 'unsafe' one exist?

    Buffer.alloc(n) allocates n bytes and zero-fills them — safe, predictable, but it pays the cost of writing zeros across the whole buffer.

    Buffer.allocUnsafe(n) allocates n bytes without initializing them, so the memory may contain leftover bytes from previously freed allocations — potentially old data (passwords, keys, other requests). It's faster precisely because it skips the zero-fill.

    The 'unsafe' version exists for hot paths where you're about to fully overwrite the buffer immediately (e.g. you copy/fill into all n bytes before reading). The danger is forgetting to overwrite some region and then sending/logging it — leaking stale memory. Default to Buffer.alloc; reach for allocUnsafe only when you'll write every byte before reading and have measured a real win.

    Never use the deprecated new Buffer(n) constructor — it's unsafe and removed/forbidden.

    What a strong answer covers
    • alloc zero-fills (safe); allocUnsafe skips initialization (faster, may expose old memory).

    • allocUnsafe may contain sensitive leftover bytes from freed allocations.

    • Only safe when you fully overwrite every byte before any read.

    • Avoid the deprecated new Buffer() constructor entirely.

    Quick self-check

    Which statement about Buffer.allocUnsafe(n) is correct?

    Red flag Using allocUnsafe and not overwriting every byte — you can leak stale heap memory into output.

    source: Node.js docs — Buffer.allocUnsafe ↗
  • Commonly asked mid coding occasional Sketch a custom Transform stream that uppercases text. What are the _transform and _flush methods for?

    Subclass Transform (or pass a transform option) and implement _transform(chunk, encoding, callback): process each incoming chunk, push any output, and call callback() to signal you're ready for the next chunk (or callback(err) to error the stream).

    ``
    import { Transform } from "node:stream";
    const upper = new Transform({
    transform(chunk, _enc, cb) {
    this.push(chunk.toString().toUpperCase());
    cb();
    },
    });
    `

    _flush(callback) is optional and runs once, after the last chunk but before the stream ends — use it to emit any buffered/trailing data (e.g. the final piece of a line-splitter that has a partial line left over). _transform is per-chunk; _flush` is the one-time finalizer.

    What a strong answer covers
    • _transform(chunk, enc, cb) runs per chunk: process, this.push(...), then cb().

    • Call cb(err) to propagate errors; calling cb signals readiness for the next chunk (backpressure-aware).

    • _flush(cb) runs once after the last chunk to emit any buffered/trailing output.

    • Pass { transform, flush } options or subclass — both work.

    Red flag Forgetting to call the _transform callback — the stream stalls because it never asks for the next chunk.

    source: Node.js docs — Implementing a Transform stream ↗
  • Commonly asked mid debug common An Express handler does `fs.readFile(bigFile, (e, data) => res.send(data))` and the server OOMs under load. What's the streaming fix?

    fs.readFile buffers the entire file into memory before sending. Under concurrency, N simultaneous requests for a big file means N full copies in RAM at once — the heap balloons and the process OOMs.

    The fix is to stream the file straight to the response, so only small chunks are in memory and backpressure throttles reads to the client's download speed:

    ``
    import { pipeline } from "node:stream/promises";
    await pipeline(fs.createReadStream(bigFile), res);
    `

    pipeline wires backpressure (a slow client pauses the file read) and cleans up/propagates errors. Memory stays ~highWaterMark-sized per request, independent of file size. (Frameworks expose this as res.sendFile/reply.send(stream)`, which stream under the hood.)

    What a strong answer covers
    • fs.readFile loads the whole file into RAM; N concurrent requests = N full copies → OOM.

    • Streaming sends chunks, so per-request memory ≈ highWaterMark regardless of file size.

    • Backpressure throttles disk reads to the client's download rate.

    • Use pipeline(createReadStream, res) (or res.sendFile) for error handling + cleanup.

    Red flag Buffering whole files with readFile in a request handler — fine in dev, OOMs under concurrent load.

    source: Node.js docs — How to use streams ↗
  • Commonly asked senior concept occasional What does stream.finished() / the 'end' vs 'finish' vs 'close' events tell you, and which fires for readable vs writable?

    Three lifecycle events that interviewers conflate:

    - 'end' — fires on a Readable when there's no more data to read (the source is exhausted).
    - 'finish' — fires on a Writable after end() is called and all data has been flushed to the underlying system.
    - 'close' — fires when the stream and its resources (file descriptor, socket) are destroyed/closed; it's the cleanup signal, on both kinds.

    Because getting these right by hand is error-prone, stream.finished(stream, cb) (and its promise form) gives you one callback that resolves when a stream is no longer readable/writable or errors — abstracting over end/finish/close/error. It's the robust way to know "this stream is truly done."

    What a strong answer covers
    • 'end' → Readable exhausted (no more data to read).

    • 'finish' → Writable flushed everything after end().

    • 'close' → underlying resource destroyed; cleanup signal on either side.

    • stream.finished() unifies end/finish/close/error into one done-or-failed callback.

    Red flag Listening for 'end' on a Writable (it never fires there) or 'finish' on a Readable — wrong event for the side.

    source: Node.js docs — stream.finished() ↗
  • Commonly asked senior concept occasional What are object-mode streams, and async iteration over a stream (for await...of)? When would you use each?

    Object mode ({ objectMode: true }) lets a stream's chunks be arbitrary JS values (objects, numbers) instead of Buffers/strings. Useful for pipelines of parsed records — e.g. a CSV row parser emitting objects into a Transform that validates them. In object mode highWaterMark counts objects, not bytes (default 16).

    Async iteration: a Readable is async-iterable, so you can consume it with for await...of:

    ``
    for await (const chunk of fs.createReadStream(file)) {
    process(chunk);
    }
    `

    This reads chunks one at a time with built-in backpressure (the loop body's await pauses reading) and lets you use ordinary try/catch for errors — far more readable than wiring 'data'/'end'/'error'` by hand. Use it whenever you'd otherwise write event-handler boilerplate to consume a stream sequentially.

    What a strong answer covers
    • Object mode: chunks are arbitrary JS values, not Buffers/strings; highWaterMark counts objects (default 16).

    • Readables are async-iterable: for await...of consumes chunk-by-chunk.

    • Async iteration has built-in backpressure and lets try/catch handle errors.

    • Use object mode for record pipelines; async iteration to avoid 'data'/'end'/'error' boilerplate.

    Red flag Assuming chunks are always Buffers — in object mode they're whatever you pushed, and toString() would mangle them.

    source: Node.js docs — Consuming readable streams with async iterators ↗
  • Commonly asked senior debug occasional Why can `chunk.toString()` on each stream chunk corrupt text, and how do you decode multi-byte data safely?

    Stream chunks split at arbitrary byte boundaries, not character boundaries. A multi-byte UTF-8 character (emoji, accented letters, CJK) can land with its first byte at the end of one chunk and the rest at the start of the next. Calling chunk.toString("utf8") on each chunk independently then decodes a partial character — producing the replacement char ` or mojibake — and you can't fix it by concatenating the broken strings afterward.

    Safe options:
    - Use
    string_decoder.StringDecoder, which buffers incomplete multi-byte sequences across chunks and only emits complete characters.
    - Or set the stream's encoding with
    setEncoding("utf8") (which uses StringDecoder internally) so 'data' yields decoded strings.
    - Or accumulate the raw Buffers and
    Buffer.concat(...).toString()` once at the end (fine for small data, not for huge streams).

    What a strong answer covers
    • Chunks break on byte boundaries; a multi-byte char can straddle two chunks.

    • chunk.toString() per chunk decodes partial characters → garbled output you can't repair by concatenation.

    • Use StringDecoder (buffers incomplete sequences) or stream.setEncoding('utf8').

    • Alternatively Buffer.concat all chunks and decode once — only for small payloads.

    Red flag Decoding each chunk with toString() independently — multi-byte characters spanning chunk boundaries corrupt.

    source: Node.js docs — StringDecoder ↗