Containers (Docker)
Image vs container, layers and layer caching (why Dockerfile order matters), writing a Dockerfile, multi-stage builds, and docker-compose.
A container packages your app with everything it needs to run — code, runtime, libraries — into one isolated unit that behaves identically on your laptop and in production, ending the “works on my machine” class of bug. The two ideas that make this fast and reproducible are images (immutable blueprints) and layers (cached, stackable filesystem diffs). Get the layer model and almost every Dockerfile best practice follows from it: order, caching, multi-stage builds, and small images are all the same insight applied.
Container vs VM — what you’re actually getting
Before the Dockerfile, fix the mental model of what a container is. A VM runs a full guest operating system on virtualized hardware via a hypervisor; a container is just an isolated process on the host’s own kernel, walled off with Linux namespaces (its own view of processes, network, filesystem) and cgroups (its CPU/memory budget). That difference is why a container starts in milliseconds and ships as megabytes while a VM boots in seconds and ships as gigabytes.
| Dimension | Container | Virtual machine |
|---|---|---|
| Isolation unit | a process (shares host kernel) | a full guest OS on a hypervisor |
| Startup | milliseconds | seconds to minutes |
| Size on disk | MBs (layers, shared) | GBs (whole OS image) |
| Density | hundreds per host | tens per host |
| Isolation strength | weaker — kernel is shared | stronger — separate kernels |
| Reach for it | microservices, CI, packing many apps | different OS/kernel, hard multi-tenant isolation |
Images, containers, and the layer cache
The instruction that should anchor your mental model: a Docker build runs each line of the Dockerfile top to bottom, and each line either hits the cache (reuse the existing layer) or misses (rebuild this layer and all below it). The single biggest speed and correctness win in a Dockerfile is ordering instructions from least- to most-frequently-changed so a code edit doesn’t invalidate your dependency install.
Writing a Dockerfile — the instructions
You can get a long way with seven instructions. FROM sets the base image; WORKDIR sets (and creates) the working directory; COPY brings files into the image; RUN executes a build-time command and bakes the result into a layer; EXPOSE documents the port (it doesn’t publish it); CMD is the default command the container runs; ENV sets environment variables.
Multi-stage builds + non-root
The single best practice for production images: build in a fat stage with all your toolchain, then copy only the compiled output into a slim runtime stage. The resulting image ships no compilers, no dev dependencies, no source — smaller to pull, faster to start, and a far smaller attack surface. Running as a non-root user is the other must-do: by default a container process runs as root, so a container escape lands the attacker as root on the host.
# ---- Stage 1: build ----
FROM node:24-slim AS build
WORKDIR /app
# Deps first — cached until the lockfile changes
COPY package*.json ./
RUN npm ci
# Then source, then compile
COPY . .
RUN npm run build && npm prune --omit=dev
# ---- Stage 2: runtime ----
FROM node:24-slim AS runtime
WORKDIR /app
ENV NODE_ENV=production
# Copy only what runtime needs from the build stage
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY --from=build /app/package.json ./
# Drop root: node:* images ship a non-root 'node' user
USER node
EXPOSE 3000
CMD ["node", "dist/server.js"]The runtime stage starts clean from node:24-slim and pulls in only node_modules, dist, and package.json from the build stage — the source tree, build cache, and any dev tooling never reach the final image. USER node ensures the app runs unprivileged. Pair this with a .dockerignore:
node_modules
npm-debug.log
.git
.env
dist
Dockerfile
.dockerignoreExcluding node_modules and dist from the context means a stale local build can’t sneak into the image and bust your cache, and excluding .env/.git keeps secrets and history out of the image entirely.
docker-compose for local dev
For multi-container local development — your app plus a database plus a cache — docker-compose declares the whole stack in one YAML file and wires up a shared network so services reach each other by name.
services:
app:
build: .
ports:
- "3000:3000"
environment:
DATABASE_URL: postgres://app:secret@db:5432/app
REDIS_URL: redis://cache:6379
depends_on:
- db
- cache
db:
image: postgres:17
environment:
POSTGRES_USER: app
POSTGRES_PASSWORD: secret
POSTGRES_DB: app
volumes:
- pgdata:/var/lib/postgresql/data
cache:
image: redis:7
volumes:
pgdata:docker compose up builds and starts all three. The app reaches Postgres at host db and Redis at cache — compose’s built-in DNS resolves service names on the shared network. The named pgdata volume persists the database across compose down/up, so you don’t lose data when you restart the stack.
01 Learning objectives
0 / 4 done02 Curated reading
03 Knowledge check
- 01medium
Why put `COPY package.json` + install BEFORE `COPY . .` in a Dockerfile?
- 02medium
Multi-stage builds primarily give you:
04 Interview questions
browse all ↗What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.
-
What is the difference between a Docker image and a container?
An image is the blueprint — an immutable, read-only stack of layers (filesystem + metadata like the default command) built from a Dockerfile. A container is a running (or stopped) instance of an image: Docker adds a thin writable layer on top of the read-only image layers and gives it an isolated process, network, and mount namespace.
The analogy: image is to container as a class is to an object, or a program on disk is to a process. You can spin up many containers from one image; each gets its own writable layer, so changes inside one container do not affect the image or the other containers.
Follow-ups they push on- What happens to data written inside a container when it is removed?
- Why are image layers read-only and the container layer writable?
Red flag Saying data persists in the image after a container writes to it — writes land in the container's ephemeral writable layer and vanish when the container is removed unless you mount a volume.
source: Docker docs — Images and layers ↗ -
Why does the order of instructions in a Dockerfile matter? How does layer caching work?
Each Dockerfile instruction creates a layer. On rebuild, Docker reuses a cached layer as long as that instruction and everything it depends on are unchanged; the first instruction that changes invalidates that layer and every layer after it.
So you order from least-frequently-changing to most-frequently-changing. The classic example for a Node app:
COPY package.jsonthenRUN npm installBEFORECOPY . .. Dependencies change rarely, so the expensivenpm installlayer stays cached across most builds; only the cheap source-copy layer rebuilds when you edit code. If youCOPY . .first, every source edit busts the cache and reinstalls all dependencies.Follow-ups they push on- Where would you put `COPY package.json` vs `COPY . .` and why?
- How does a `.dockerignore` file interact with build caching?
Red flag Copying the whole source tree before installing dependencies — every code change then invalidates the dependency-install layer and forces a slow full reinstall.
source: Docker docs — Building best practices ↗ -
Write a multi-stage Dockerfile for a Node app and explain why multi-stage builds matter.
A multi-stage build uses multiple
FROMstatements: a heavy build stage compiles/installs, then a slim runtime stage copies only the final artifacts. The build toolchain (compilers, dev dependencies) never ships in the final image, so it is smaller and has a smaller attack surface.FROM node:20 AS buildWORKDIR /appCOPY package*.json ./RUN npm ciCOPY . .RUN npm run buildFROM node:20-slimWORKDIR /appCOPY --from=build /app/dist ./distCOPY --from=build /app/node_modules ./node_modulesUSER nodeEXPOSE 3000CMD ["node", "dist/server.js"]The
COPY --from=buildpulls only built output from the earlier stage; the final image starts from a slim base and runs as the non-rootnodeuser.Follow-ups they push on- Why run as a non-root user in the final stage?
- How would you get an even smaller image (distroless / alpine)?
Red flag Shipping the full build image with dev dependencies and toolchain, or running as root in the final stage — bigger image, larger attack surface, and a container that can do more damage if compromised.
source: Docker docs — Multi-stage builds ↗ -
What is a `.dockerignore` file and why does it matter for both build speed and security?
.dockerignorelists paths excluded from the build context — the set of files the Docker daemon receives before building. Excludingnode_modules,.git, build output, and local env files makes the context smaller, so builds start faster and the cache is less likely to bust on irrelevant changes.The security angle: without it, a
COPY . .can sweep secrets (.env,.aws/, private keys,.githistory) straight into an image layer, where they persist even if a later layer deletes them. So.dockerignoreboth speeds up builds and keeps secrets out of the image.Follow-ups they push on- Why does deleting a secret in a later layer not actually remove it from the image?
- What belongs in a typical `.dockerignore`?
Red flag Believing that a `RUN rm secret` later in the Dockerfile removes the secret — layers are additive, so the file still lives in the earlier layer and can be extracted from the image history.
source: Docker docs — Building best practices (.dockerignore) ↗ -
When would you use docker-compose, and what problem does it solve?
docker-compose defines and runs a multi-container app from a single declarative YAML file. Instead of starting each container with a long
docker runand wiring up networks/volumes by hand, you describe the services (app, db, cache), their images/build contexts, ports, env, volumes, and dependencies, thendocker compose upbrings the whole stack up on a shared network where services reach each other by service name.Its sweet spot is local development and CI — reproducing a realistic multi-service environment (e.g. an API + Postgres + Redis) with one command. It is not an orchestrator; for production scheduling, self-healing, and scaling across many machines you reach for Kubernetes.
Follow-ups they push on- How do services in a compose file discover each other?
- Why is compose not a substitute for Kubernetes in production?
Red flag Pitching docker-compose as a production orchestration tool — it does not give you multi-node scheduling, self-healing, or rolling updates across a cluster.
source: Docker docs — Docker Compose overview ↗ -
What is the difference between `CMD` and `ENTRYPOINT` in a Dockerfile?
Both define what runs when the container starts, but they compose differently.
ENTRYPOINTsets the fixed executable;CMDsets default arguments that are easy to override atdocker runtime.With
ENTRYPOINT ["python", "app.py"]the container always runs that; anything you pass todocker runis appended as args. With onlyCMD ["python", "app.py"], passing a command todocker runreplaces it entirely. A common pattern isENTRYPOINTfor the binary plusCMDfor default flags, sodocker run imageuses the defaults anddocker run image --other-flagoverrides just the flags.Prefer the exec form (JSON array) over the shell form so signals like
SIGTERMreach your process directly for clean shutdown.Follow-ups they push on- Why does the exec form matter for graceful shutdown / signal handling?
- How do `ENTRYPOINT` and `CMD` combine when both are present?
Red flag Using the shell form (`CMD node server.js`) so the app runs as a child of `/bin/sh`, which swallows `SIGTERM` — the container then gets SIGKILLed on stop instead of shutting down gracefully.
source: Docker docs — Dockerfile reference (CMD / ENTRYPOINT) ↗ -
Your Docker image is 1.2GB and builds take 10 minutes on every code change. How do you debug and fix it?
Two separate problems: image size and build time.
Size: run
docker history <image>to see which layers are fat. Usual culprits are a heavy base image (use-slim/-alpine/distroless), build toolchain shipped in the runtime image (fix with a multi-stage build copying only artifacts), and dev dependencies (npm ci --omit=dev). Combine relatedRUNsteps and clean package caches in the same layer so the cleanup actually shrinks the layer.Build time on every change: this is almost always cache invalidation from instruction order. Copy and install dependencies before copying source, add a
.dockerignoreso unrelated files do not bust the context, and enable BuildKit so independent stages build in parallel. After reordering, only the source layer rebuilds on a code edit, dropping the loop from minutes to seconds.Follow-ups they push on- Which tool shows you per-layer size, and what do you look for?
- Why does cleaning a cache in a separate `RUN` not reduce image size?
Red flag Adding `RUN rm -rf /var/cache/...` as a new layer after the install layer — additive layers mean the bytes still count; the cleanup must happen in the same `RUN` as the install.
source: Docker docs — Building best practices ↗ -
What is the difference between a Docker volume and a bind mount, and when do you use each?
Both persist data outside the container's ephemeral writable layer, but they differ in who owns the storage. A named volume is managed by Docker in its own storage area (
/var/lib/docker/volumes/...); you reference it by name, Docker handles the location, and it is the portable, production-friendly default — great for databases and app data that must outlive a container.A bind mount maps a specific host path straight into the container. It is tied to the host's directory layout, so it is ideal for local development (mount your source code so edits show up live) but brittle and host-coupled for production.
Rule of thumb: volumes for data Docker should manage and that must survive container removal; bind mounts for sharing host files into a container during development. A third option,
tmpfs, keeps data in memory only — for secrets/scratch that should never hit disk.What a strong answer coversBoth survive the container's ephemeral writable layer; the difference is who owns the storage.
Named volume: Docker-managed, portable, the production default (databases, persistent app data).
Bind mount: a specific host path into the container — perfect for live-reloading source in local dev.
Bind mounts are host-coupled and brittle for production; volumes abstract the location away.
tmpfsmounts live in memory only — for scratch/secret data that must never touch disk.
Quick self-checkYou want a Postgres container's data to survive container recreation and stay portable across hosts. Use:
-
Correct — Docker manages the storage and location, so the data persists and the setup is portable.
-
Works on that host, but couples the container to a specific host path — not portable.
-
That layer is deleted with the container — data does not survive recreation.
-
tmpfs is in-memory and vanishes on stop — the opposite of durable persistence.
Follow-ups they push on- Why is a bind mount a poor choice for production data persistence?
- Where does a named volume actually live, and why does that make it portable?
- When would you reach for a tmpfs mount?
Red flag Relying on a bind mount in production — it couples the container to the host's exact directory layout, so the same image behaves differently (or breaks) on another host; use a named volume so Docker owns the storage.
source: Docker docs — Volumes ↗ -
How do containers achieve isolation? What kernel features make a container different from a VM?
A container is just a regular Linux process that the kernel isolates using two features: namespaces and cgroups. Namespaces scope *what a process can see* — separate PID, network, mount, user, and hostname namespaces make the process believe it has its own process tree, network stack, and filesystem. cgroups scope *what it can use* — CPU, memory, and I/O limits. Together they give the illusion of a private machine while everything shares one host kernel.
That shared kernel is the key contrast with a VM: a VM runs a full guest OS with its own kernel on top of a hypervisor, so it is heavier (GBs, slow boot) but more strongly isolated. A container shares the host kernel, so it is lightweight (MBs, sub-second start) but the isolation is weaker — a kernel exploit can cross the boundary.
This is why containers pack densely and start fast, and why you don't run untrusted multi-tenant workloads on bare containers without extra sandboxing.
What a strong answer coversA container is a host process isolated by namespaces (what it can see) + cgroups (what it can use).
Namespaces: PID, network, mount, user, UTS — each process gets its own view of the system.
cgroups bound CPU/memory/IO so one container can't starve the others.
Containers share the host kernel (light, fast); VMs run a full guest OS + hypervisor (heavy, stronger isolation).
Weaker container isolation is why untrusted multi-tenant workloads need extra sandboxing (gVisor, microVMs).
Quick self-checkWhich pair of Linux kernel features primarily provides container isolation?
-
Correct — namespaces isolate what a process sees; cgroups limit what it can consume.
-
That describes a VM, not a container — containers have neither.
-
chroot only scopes the filesystem root; it is far short of full container isolation.
-
seccomp hardens syscalls and TLS is unrelated; neither provides the core view/resource isolation.
Follow-ups they push on- What do namespaces isolate vs what cgroups limit?
- Why does sharing the host kernel make containers faster but less isolated than VMs?
- When would you still prefer a VM (or microVM) over a plain container?
Red flag Describing a container as a 'lightweight VM' — there is no guest OS or hypervisor; it is a host process with kernel-enforced isolation, which is exactly why the isolation boundary is weaker than a VM's.
source: Docker docs — What is a container? ↗ -
What is the difference between Docker's default bridge network and a user-defined bridge network?
Both use the
bridgedriver, but a user-defined bridge adds the feature you almost always want: built-in DNS-based service discovery. Containers on the same user-defined network can reach each other by container name (http://api:3000), because Docker runs an embedded DNS resolver for that network.On the default
bridgenetwork, name resolution is not provided — containers can only reach each other by IP (or the legacy, deprecated--link), which is fragile because IPs change. User-defined networks also give you better isolation (only containers you attach can talk) and let you attach/detach containers on the fly.The practical takeaway: for any multi-container app, create a user-defined bridge (which is exactly what docker-compose does automatically) so services find each other by name rather than chasing IP addresses.
What a strong answer coversUser-defined bridge networks give automatic DNS — reach containers by name.
The default bridge has no name resolution (IP only, or deprecated
--link).User-defined networks add isolation — only attached containers can communicate.
Compose creates a user-defined network for you, which is why services resolve each other by service name.
Prefer user-defined bridges for any multi-container app; avoid relying on the default bridge.
Follow-ups they push on- Why is reaching containers by IP on the default bridge fragile?
- How does docker-compose use this under the hood?
- What does the `host` network driver change about all this?
Red flag Expecting container-name DNS resolution to work on the default `bridge` network — it doesn't; you must create a user-defined network (or use compose) to get name-based service discovery.
source: Docker docs — Networking overview ↗ -
Your container starts and immediately exits with code 0, and you don't know why. How do you debug it?
Exit code 0 means the main process finished successfully — a container lives exactly as long as its PID 1 runs, so if the command completes, the container stops. This is usually a misconception, not a bug: the image's
CMD/ENTRYPOINTran a one-shot command (or a process that daemonized into the background) instead of a long-running foreground process.Debug it:
docker ps -ato confirm the exit code,docker logs <container>to see what it printed, anddocker inspect <container>for the actual command and config. Then check whetherCMDruns a foreground process — a common trap is starting a server that forks into the background, so PID 1 returns and the container exits.Fix: make the entrypoint run a long-lived foreground process (e.g.
nginx -g 'daemon off;', or run the app directly rather than via a launcher that backgrounds it). For interactive debugging, override the entrypoint:docker run -it --entrypoint sh <image>.What a strong answer coversA container runs only as long as its PID 1; exit 0 = the main command completed normally.
Usual cause:
CMDran a one-shot command, or a server daemonized into the background so PID 1 returned.Inspect with
docker ps -a(exit code),docker logs, anddocker inspect(the actual command).Fix: run the process in the foreground (e.g.
nginx -g 'daemon off;').Drop into the image to poke around:
docker run -it --entrypoint sh <image>.
Follow-ups they push on- Why does a server that forks into the background cause the container to exit?
- How do you get a shell inside an image whose entrypoint exits immediately?
- How is exit code 0 different in meaning from 137 or 1?
Red flag Assuming a clean exit code 0 means something crashed — it means the foreground process finished; the real fix is running a long-lived foreground process as PID 1, not adding restart policies.
source: Docker docs — Run and manage containers ↗ -
What is a container registry, and what is the danger of deploying images tagged `:latest`?
A registry (Docker Hub, GHCR, ECR) is the remote store for images: you
pushbuilt images to it and nodespullthem at deploy time. An image is addressed byregistry/repository:tagplus an immutable content digest (sha256:...).The
:latesttag is the trap. It is just a mutable label, not a guarantee of newness — it points to whatever was last pushed with that tag, and it can be overwritten. So 'deploy:latest' is non-deterministic: two nodes pulling at different times can run different code, you can't tell which build is in production, and rollbacks are ambiguous. It also undermines caching (Docker may skip re-pulling a tag it already has, so you can silently run a stale image).The fix: deploy immutable, specific tags (a version or git SHA, e.g.
:1.4.2or:sha-abc123), or pin by digest. Reserve:latestfor casual local use only.What a strong answer coversA registry stores images; nodes
pullbyrepo:tagplus an immutablesha256digest.:latestis a mutable pointer, not 'the newest' — it can be overwritten and means different things over time.Deploying
:latestis non-deterministic: nodes can run different builds; rollbacks are ambiguous.Pin to a version or git SHA tag (or the digest) so a deploy is reproducible and traceable.
It also defeats reliable cache invalidation — you can silently keep running a stale image.
Quick self-checkWhat does the `:latest` tag actually guarantee about an image?
-
Correct — `latest` is just a tag like any other; it can point to an old or arbitrary build.
-
Only if someone re-tags every new build as latest; the tag itself enforces nothing.
-
Backwards — the immutable identifier is the sha256 digest, not the `latest` tag.
-
Pull behavior depends on pull policy/cache, not the tag name; Docker may reuse a cached `latest`.
Follow-ups they push on- Why is pinning by digest the strongest guarantee of running an exact image?
- How does `:latest` make a rollback ambiguous?
- What naming scheme would you use for production image tags?
Red flag Shipping `:latest` to production — it is mutable, so different nodes can run different code and you lose the ability to say exactly which build is live or roll back to a known-good one.
source: Docker docs — Push and pull / registries ↗ -
What is the difference between `COPY` and `ADD` in a Dockerfile, and which should you default to?
Both copy files into the image, but
ADDhas two extra, surprising behaviors: it can fetch a remote URL, and it auto-extracts local tar archives into the destination.COPYdoes exactly one thing — copy local files/directories from the build context — with no magic.The guidance (and Docker's own best practice) is to default to
COPYbecause it is explicit and predictable. ReserveADDfor the one case it is genuinely good at: copying-and-extracting a local tarball in a single step. For fetching remote files, prefer an explicitRUN curl/wget(or better,ADD's checksum options) so the intent and caching are clear.The trick the interviewer is checking: candidates who use
ADD https://...casually may not realize it bypasses the clarity ofCOPYand can silently auto-extract archives, leading to surprising image contents.What a strong answer coversCOPYcopies local build-context files only — no surprises.ADDalso fetches remote URLs and auto-extracts local tar archives.Default to
COPYfor predictability (Docker's own best-practice guidance).Use
ADDonly for its niche win: copy-and-extract a local tarball in one step.For remote downloads prefer explicit
RUN curl/wgetso caching and intent are clear.
Follow-ups they push on- What surprising thing happens if you `ADD` a local `.tar.gz` file?
- Why is `RUN curl` often preferred over `ADD <url>` for remote files?
- When is `ADD` genuinely the right choice?
Red flag Using `ADD` everywhere as a synonym for `COPY` — its auto-extraction of tar archives and URL fetching are silent, surprising behaviors; default to `COPY` and reach for `ADD` only deliberately.
source: Docker docs — Dockerfile reference (ADD / COPY) ↗