> cs·fundamentals
interview 0% 26m read
6.2.1 ★ core [J][A] 13 interview Q's

Containers (Docker)

Image vs container, layers and layer caching (why Dockerfile order matters), writing a Dockerfile, multi-stage builds, and docker-compose.

A container packages your app with everything it needs to run — code, runtime, libraries — into one isolated unit that behaves identically on your laptop and in production, ending the “works on my machine” class of bug. The two ideas that make this fast and reproducible are images (immutable blueprints) and layers (cached, stackable filesystem diffs). Get the layer model and almost every Dockerfile best practice follows from it: order, caching, multi-stage builds, and small images are all the same insight applied.

Container vs VM — what you’re actually getting

Before the Dockerfile, fix the mental model of what a container is. A VM runs a full guest operating system on virtualized hardware via a hypervisor; a container is just an isolated process on the host’s own kernel, walled off with Linux namespaces (its own view of processes, network, filesystem) and cgroups (its CPU/memory budget). That difference is why a container starts in milliseconds and ships as megabytes while a VM boots in seconds and ships as gigabytes.

DimensionContainerVirtual machine
Isolation unita process (shares host kernel)a full guest OS on a hypervisor
Startupmillisecondsseconds to minutes
Size on diskMBs (layers, shared)GBs (whole OS image)
Densityhundreds per hosttens per host
Isolation strengthweaker — kernel is sharedstronger — separate kernels
Reach for itmicroservices, CI, packing many appsdifferent OS/kernel, hard multi-tenant isolation
Containers trade some isolation for huge gains in density and startup speed — most app workloads want containers.

Images, containers, and the layer cache

The instruction that should anchor your mental model: a Docker build runs each line of the Dockerfile top to bottom, and each line either hits the cache (reuse the existing layer) or misses (rebuild this layer and all below it). The single biggest speed and correctness win in a Dockerfile is ordering instructions from least- to most-frequently-changed so a code edit doesn’t invalidate your dependency install.

A vertical stack of Dockerfile layers from FROM at the bottom to CMD at the top; a dashed cache-boundary line sits below the COPY-source layer, with cached layers in green below it and rebuilt layers in amber above it.image = ordered stack of read-only layersFROM node:24-slimbase · cachedCOPY package*.json ./cachedRUN npm cicached (slow step!)▲ cache boundary — a source edit invalidates from here up ▲COPY . .rebuilt on editRUN npm run buildrebuiltCMD [“node”,“dist/server.js”] ← writable layer at runtime
FIG 1 · the layer stack + cache boundary Each instruction is a layer. Editing source busts the COPY layer and everything below it — but the deps install above the line stays cached, so the rebuild is near-instant.

Writing a Dockerfile — the instructions

You can get a long way with seven instructions. FROM sets the base image; WORKDIR sets (and creates) the working directory; COPY brings files into the image; RUN executes a build-time command and bakes the result into a layer; EXPOSE documents the port (it doesn’t publish it); CMD is the default command the container runs; ENV sets environment variables.

Multi-stage builds + non-root

The single best practice for production images: build in a fat stage with all your toolchain, then copy only the compiled output into a slim runtime stage. The resulting image ships no compilers, no dev dependencies, no source — smaller to pull, faster to start, and a far smaller attack surface. Running as a non-root user is the other must-do: by default a container process runs as root, so a container escape lands the attacker as root on the host.

A production multi-stage Node Dockerfile
# ---- Stage 1: build ----
FROM node:24-slim AS build
WORKDIR /app

# Deps first — cached until the lockfile changes
COPY package*.json ./
RUN npm ci

# Then source, then compile
COPY . .
RUN npm run build && npm prune --omit=dev

# ---- Stage 2: runtime ----
FROM node:24-slim AS runtime
WORKDIR /app
ENV NODE_ENV=production

# Copy only what runtime needs from the build stage
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY --from=build /app/package.json ./

# Drop root: node:* images ship a non-root 'node' user
USER node

EXPOSE 3000
CMD ["node", "dist/server.js"]

The runtime stage starts clean from node:24-slim and pulls in only node_modules, dist, and package.json from the build stage — the source tree, build cache, and any dev tooling never reach the final image. USER node ensures the app runs unprivileged. Pair this with a .dockerignore:

node_modules
npm-debug.log
.git
.env
dist
Dockerfile
.dockerignore

Excluding node_modules and dist from the context means a stale local build can’t sneak into the image and bust your cache, and excluding .env/.git keeps secrets and history out of the image entirely.

docker-compose for local dev

For multi-container local development — your app plus a database plus a cache — docker-compose declares the whole stack in one YAML file and wires up a shared network so services reach each other by name.

A minimal compose stack: app + Postgres + Redis
services:
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      DATABASE_URL: postgres://app:secret@db:5432/app
      REDIS_URL: redis://cache:6379
    depends_on:
      - db
      - cache

  db:
    image: postgres:17
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD: secret
      POSTGRES_DB: app
    volumes:
      - pgdata:/var/lib/postgresql/data

  cache:
    image: redis:7

volumes:
  pgdata:

docker compose up builds and starts all three. The app reaches Postgres at host db and Redis at cache — compose’s built-in DNS resolves service names on the shared network. The named pgdata volume persists the database across compose down/up, so you don’t lose data when you restart the stack.

01 Learning objectives

0 / 4 done

02 Curated reading

03 Knowledge check

knowledge check2 questions · pass ≥ 70%
  1. 01medium

    Why put `COPY package.json` + install BEFORE `COPY . .` in a Dockerfile?

  2. 02medium

    Multi-stage builds primarily give you:

04 Interview questions

browse all ↗

What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.

  • Commonly asked mid concept very common What is the difference between a Docker image and a container?

    An image is the blueprint — an immutable, read-only stack of layers (filesystem + metadata like the default command) built from a Dockerfile. A container is a running (or stopped) instance of an image: Docker adds a thin writable layer on top of the read-only image layers and gives it an isolated process, network, and mount namespace.

    The analogy: image is to container as a class is to an object, or a program on disk is to a process. You can spin up many containers from one image; each gets its own writable layer, so changes inside one container do not affect the image or the other containers.

    Red flag Saying data persists in the image after a container writes to it — writes land in the container's ephemeral writable layer and vanish when the container is removed unless you mount a volume.

    source: Docker docs — Images and layers ↗
  • Commonly asked mid concept very common Why does the order of instructions in a Dockerfile matter? How does layer caching work?

    Each Dockerfile instruction creates a layer. On rebuild, Docker reuses a cached layer as long as that instruction and everything it depends on are unchanged; the first instruction that changes invalidates that layer and every layer after it.

    So you order from least-frequently-changing to most-frequently-changing. The classic example for a Node app: COPY package.json then RUN npm install BEFORE COPY . .. Dependencies change rarely, so the expensive npm install layer stays cached across most builds; only the cheap source-copy layer rebuilds when you edit code. If you COPY . . first, every source edit busts the cache and reinstalls all dependencies.

    Red flag Copying the whole source tree before installing dependencies — every code change then invalidates the dependency-install layer and forces a slow full reinstall.

    source: Docker docs — Building best practices ↗
  • Commonly asked senior coding common Write a multi-stage Dockerfile for a Node app and explain why multi-stage builds matter.

    A multi-stage build uses multiple FROM statements: a heavy build stage compiles/installs, then a slim runtime stage copies only the final artifacts. The build toolchain (compilers, dev dependencies) never ships in the final image, so it is smaller and has a smaller attack surface.

    FROM node:20 AS build
    WORKDIR /app
    COPY package*.json ./
    RUN npm ci
    COPY . .
    RUN npm run build

    FROM node:20-slim
    WORKDIR /app
    COPY --from=build /app/dist ./dist
    COPY --from=build /app/node_modules ./node_modules
    USER node
    EXPOSE 3000
    CMD ["node", "dist/server.js"]

    The COPY --from=build pulls only built output from the earlier stage; the final image starts from a slim base and runs as the non-root node user.

    Red flag Shipping the full build image with dev dependencies and toolchain, or running as root in the final stage — bigger image, larger attack surface, and a container that can do more damage if compromised.

    source: Docker docs — Multi-stage builds ↗
  • Commonly asked mid concept common What is a `.dockerignore` file and why does it matter for both build speed and security?

    .dockerignore lists paths excluded from the build context — the set of files the Docker daemon receives before building. Excluding node_modules, .git, build output, and local env files makes the context smaller, so builds start faster and the cache is less likely to bust on irrelevant changes.

    The security angle: without it, a COPY . . can sweep secrets (.env, .aws/, private keys, .git history) straight into an image layer, where they persist even if a later layer deletes them. So .dockerignore both speeds up builds and keeps secrets out of the image.

    Red flag Believing that a `RUN rm secret` later in the Dockerfile removes the secret — layers are additive, so the file still lives in the earlier layer and can be extracted from the image history.

    source: Docker docs — Building best practices (.dockerignore) ↗
  • Commonly asked mid concept common When would you use docker-compose, and what problem does it solve?

    docker-compose defines and runs a multi-container app from a single declarative YAML file. Instead of starting each container with a long docker run and wiring up networks/volumes by hand, you describe the services (app, db, cache), their images/build contexts, ports, env, volumes, and dependencies, then docker compose up brings the whole stack up on a shared network where services reach each other by service name.

    Its sweet spot is local development and CI — reproducing a realistic multi-service environment (e.g. an API + Postgres + Redis) with one command. It is not an orchestrator; for production scheduling, self-healing, and scaling across many machines you reach for Kubernetes.

    Red flag Pitching docker-compose as a production orchestration tool — it does not give you multi-node scheduling, self-healing, or rolling updates across a cluster.

    source: Docker docs — Docker Compose overview ↗
  • Commonly asked senior trick common What is the difference between `CMD` and `ENTRYPOINT` in a Dockerfile?

    Both define what runs when the container starts, but they compose differently. ENTRYPOINT sets the fixed executable; CMD sets default arguments that are easy to override at docker run time.

    With ENTRYPOINT ["python", "app.py"] the container always runs that; anything you pass to docker run is appended as args. With only CMD ["python", "app.py"], passing a command to docker run replaces it entirely. A common pattern is ENTRYPOINT for the binary plus CMD for default flags, so docker run image uses the defaults and docker run image --other-flag overrides just the flags.

    Prefer the exec form (JSON array) over the shell form so signals like SIGTERM reach your process directly for clean shutdown.

    Red flag Using the shell form (`CMD node server.js`) so the app runs as a child of `/bin/sh`, which swallows `SIGTERM` — the container then gets SIGKILLed on stop instead of shutting down gracefully.

    source: Docker docs — Dockerfile reference (CMD / ENTRYPOINT) ↗
  • Commonly asked senior debug common Your Docker image is 1.2GB and builds take 10 minutes on every code change. How do you debug and fix it?

    Two separate problems: image size and build time.

    Size: run docker history <image> to see which layers are fat. Usual culprits are a heavy base image (use -slim/-alpine/distroless), build toolchain shipped in the runtime image (fix with a multi-stage build copying only artifacts), and dev dependencies (npm ci --omit=dev). Combine related RUN steps and clean package caches in the same layer so the cleanup actually shrinks the layer.

    Build time on every change: this is almost always cache invalidation from instruction order. Copy and install dependencies before copying source, add a .dockerignore so unrelated files do not bust the context, and enable BuildKit so independent stages build in parallel. After reordering, only the source layer rebuilds on a code edit, dropping the loop from minutes to seconds.

    Red flag Adding `RUN rm -rf /var/cache/...` as a new layer after the install layer — additive layers mean the bytes still count; the cleanup must happen in the same `RUN` as the install.

    source: Docker docs — Building best practices ↗
  • Commonly asked mid concept common What is the difference between a Docker volume and a bind mount, and when do you use each?

    Both persist data outside the container's ephemeral writable layer, but they differ in who owns the storage. A named volume is managed by Docker in its own storage area (/var/lib/docker/volumes/...); you reference it by name, Docker handles the location, and it is the portable, production-friendly default — great for databases and app data that must outlive a container.

    A bind mount maps a specific host path straight into the container. It is tied to the host's directory layout, so it is ideal for local development (mount your source code so edits show up live) but brittle and host-coupled for production.

    Rule of thumb: volumes for data Docker should manage and that must survive container removal; bind mounts for sharing host files into a container during development. A third option, tmpfs, keeps data in memory only — for secrets/scratch that should never hit disk.

    What a strong answer covers
    • Both survive the container's ephemeral writable layer; the difference is who owns the storage.

    • Named volume: Docker-managed, portable, the production default (databases, persistent app data).

    • Bind mount: a specific host path into the container — perfect for live-reloading source in local dev.

    • Bind mounts are host-coupled and brittle for production; volumes abstract the location away.

    • tmpfs mounts live in memory only — for scratch/secret data that must never touch disk.

    Quick self-check

    You want a Postgres container's data to survive container recreation and stay portable across hosts. Use:

    Red flag Relying on a bind mount in production — it couples the container to the host's exact directory layout, so the same image behaves differently (or breaks) on another host; use a named volume so Docker owns the storage.

    source: Docker docs — Volumes ↗
  • Commonly asked senior concept common How do containers achieve isolation? What kernel features make a container different from a VM?

    A container is just a regular Linux process that the kernel isolates using two features: namespaces and cgroups. Namespaces scope *what a process can see* — separate PID, network, mount, user, and hostname namespaces make the process believe it has its own process tree, network stack, and filesystem. cgroups scope *what it can use* — CPU, memory, and I/O limits. Together they give the illusion of a private machine while everything shares one host kernel.

    That shared kernel is the key contrast with a VM: a VM runs a full guest OS with its own kernel on top of a hypervisor, so it is heavier (GBs, slow boot) but more strongly isolated. A container shares the host kernel, so it is lightweight (MBs, sub-second start) but the isolation is weaker — a kernel exploit can cross the boundary.

    This is why containers pack densely and start fast, and why you don't run untrusted multi-tenant workloads on bare containers without extra sandboxing.

    What a strong answer covers
    • A container is a host process isolated by namespaces (what it can see) + cgroups (what it can use).

    • Namespaces: PID, network, mount, user, UTS — each process gets its own view of the system.

    • cgroups bound CPU/memory/IO so one container can't starve the others.

    • Containers share the host kernel (light, fast); VMs run a full guest OS + hypervisor (heavy, stronger isolation).

    • Weaker container isolation is why untrusted multi-tenant workloads need extra sandboxing (gVisor, microVMs).

    Quick self-check

    Which pair of Linux kernel features primarily provides container isolation?

    Red flag Describing a container as a 'lightweight VM' — there is no guest OS or hypervisor; it is a host process with kernel-enforced isolation, which is exactly why the isolation boundary is weaker than a VM's.

    source: Docker docs — What is a container? ↗
  • Commonly asked mid concept occasional What is the difference between Docker's default bridge network and a user-defined bridge network?

    Both use the bridge driver, but a user-defined bridge adds the feature you almost always want: built-in DNS-based service discovery. Containers on the same user-defined network can reach each other by container name (http://api:3000), because Docker runs an embedded DNS resolver for that network.

    On the default bridge network, name resolution is not provided — containers can only reach each other by IP (or the legacy, deprecated --link), which is fragile because IPs change. User-defined networks also give you better isolation (only containers you attach can talk) and let you attach/detach containers on the fly.

    The practical takeaway: for any multi-container app, create a user-defined bridge (which is exactly what docker-compose does automatically) so services find each other by name rather than chasing IP addresses.

    What a strong answer covers
    • User-defined bridge networks give automatic DNS — reach containers by name.

    • The default bridge has no name resolution (IP only, or deprecated --link).

    • User-defined networks add isolation — only attached containers can communicate.

    • Compose creates a user-defined network for you, which is why services resolve each other by service name.

    • Prefer user-defined bridges for any multi-container app; avoid relying on the default bridge.

    Red flag Expecting container-name DNS resolution to work on the default `bridge` network — it doesn't; you must create a user-defined network (or use compose) to get name-based service discovery.

    source: Docker docs — Networking overview ↗
  • Commonly asked senior debug occasional Your container starts and immediately exits with code 0, and you don't know why. How do you debug it?

    Exit code 0 means the main process finished successfully — a container lives exactly as long as its PID 1 runs, so if the command completes, the container stops. This is usually a misconception, not a bug: the image's CMD/ENTRYPOINT ran a one-shot command (or a process that daemonized into the background) instead of a long-running foreground process.

    Debug it: docker ps -a to confirm the exit code, docker logs <container> to see what it printed, and docker inspect <container> for the actual command and config. Then check whether CMD runs a foreground process — a common trap is starting a server that forks into the background, so PID 1 returns and the container exits.

    Fix: make the entrypoint run a long-lived foreground process (e.g. nginx -g 'daemon off;', or run the app directly rather than via a launcher that backgrounds it). For interactive debugging, override the entrypoint: docker run -it --entrypoint sh <image>.

    What a strong answer covers
    • A container runs only as long as its PID 1; exit 0 = the main command completed normally.

    • Usual cause: CMD ran a one-shot command, or a server daemonized into the background so PID 1 returned.

    • Inspect with docker ps -a (exit code), docker logs, and docker inspect (the actual command).

    • Fix: run the process in the foreground (e.g. nginx -g 'daemon off;').

    • Drop into the image to poke around: docker run -it --entrypoint sh <image>.

    Red flag Assuming a clean exit code 0 means something crashed — it means the foreground process finished; the real fix is running a long-lived foreground process as PID 1, not adding restart policies.

    source: Docker docs — Run and manage containers ↗
  • Commonly asked mid concept occasional What is a container registry, and what is the danger of deploying images tagged `:latest`?

    A registry (Docker Hub, GHCR, ECR) is the remote store for images: you push built images to it and nodes pull them at deploy time. An image is addressed by registry/repository:tag plus an immutable content digest (sha256:...).

    The :latest tag is the trap. It is just a mutable label, not a guarantee of newness — it points to whatever was last pushed with that tag, and it can be overwritten. So 'deploy :latest' is non-deterministic: two nodes pulling at different times can run different code, you can't tell which build is in production, and rollbacks are ambiguous. It also undermines caching (Docker may skip re-pulling a tag it already has, so you can silently run a stale image).

    The fix: deploy immutable, specific tags (a version or git SHA, e.g. :1.4.2 or :sha-abc123), or pin by digest. Reserve :latest for casual local use only.

    What a strong answer covers
    • A registry stores images; nodes pull by repo:tag plus an immutable sha256 digest.

    • :latest is a mutable pointer, not 'the newest' — it can be overwritten and means different things over time.

    • Deploying :latest is non-deterministic: nodes can run different builds; rollbacks are ambiguous.

    • Pin to a version or git SHA tag (or the digest) so a deploy is reproducible and traceable.

    • It also defeats reliable cache invalidation — you can silently keep running a stale image.

    Quick self-check

    What does the `:latest` tag actually guarantee about an image?

    Red flag Shipping `:latest` to production — it is mutable, so different nodes can run different code and you lose the ability to say exactly which build is live or roll back to a known-good one.

    source: Docker docs — Push and pull / registries ↗
  • Commonly asked mid trick occasional What is the difference between `COPY` and `ADD` in a Dockerfile, and which should you default to?

    Both copy files into the image, but ADD has two extra, surprising behaviors: it can fetch a remote URL, and it auto-extracts local tar archives into the destination. COPY does exactly one thing — copy local files/directories from the build context — with no magic.

    The guidance (and Docker's own best practice) is to default to COPY because it is explicit and predictable. Reserve ADD for the one case it is genuinely good at: copying-and-extracting a local tarball in a single step. For fetching remote files, prefer an explicit RUN curl/wget (or better, ADD's checksum options) so the intent and caching are clear.

    The trick the interviewer is checking: candidates who use ADD https://... casually may not realize it bypasses the clarity of COPY and can silently auto-extract archives, leading to surprising image contents.

    What a strong answer covers
    • COPY copies local build-context files only — no surprises.

    • ADD also fetches remote URLs and auto-extracts local tar archives.

    • Default to COPY for predictability (Docker's own best-practice guidance).

    • Use ADD only for its niche win: copy-and-extract a local tarball in one step.

    • For remote downloads prefer explicit RUN curl/wget so caching and intent are clear.

    Red flag Using `ADD` everywhere as a synonym for `COPY` — its auto-extraction of tar archives and URL fetching are silent, surprising behaviors; default to `COPY` and reach for `ADD` only deliberately.

    source: Docker docs — Dockerfile reference (ADD / COPY) ↗