> cs·fundamentals
interview 0% 18m read
6.2.3 [J][A] 12 interview Q's

CI/CD

Pipeline stages (build → test → deploy) and writing a basic GitHub Actions workflow.

CI/CD automates the path from a pushed commit to running software: CI (continuous integration) builds and tests every change so problems surface in minutes, and CD (continuous delivery/deployment) takes a green build and ships it. The mental model is a pipeline of stages — build → test → deploy — where a failure at any stage stops the line before bad code reaches users. The payoff is fast feedback: a broken build is caught minutes after the commit, while the author still has the change in their head, instead of days later in a manual QA pass.

The stages: build → test → deploy

Each stage gates the next. The ordering matters because each is cheaper and faster than the one after, so you want the most likely failures to fail first (fail fast).

A git push flows left to right through Build, Test, and Deploy stages connected by arrows; a red X branch from each stage shows that a failure stops the pipeline.git pushPR / mainBuildcompile · lint · imageTestunit → integ → e2eDeploystaging → prodsame artifact promoted →any failure ✗ → stop the line
FIG 1 · the pipeline A push triggers the pipeline; each stage must go green to unlock the next, and only a fully-tested artifact reaches deploy. A red stage stops the line.
StageWhat runsWhere it runsFails the build when
BuildCompile, install deps, lint, build the image/artifactOn a runner, on every push/PRCode doesn't compile, lint errors, broken deps
TestUnit → integration → e2e (cheap-to-expensive)Runner, often parallel matrix across versionsAny test fails or coverage drops below the gate
DeployPush the tested artifact to an environmentRunner with deploy creds → staging/prodSmoke checks fail post-deploy (→ auto-rollback)
Cheap, fast checks first; the slow, costly deploy only runs on a fully-green build.

A healthy pipeline deploys the same artifact through environments — build once in the build stage, then promote that identical image through staging to production. Rebuilding per-environment risks shipping something subtly different from what your tests validated: a dependency that resolved to a newer patch version, a different build timestamp, a config baked at the wrong moment. “Test what you ship, ship what you tested” is only true if it’s literally the same bytes.

A GitHub Actions workflow

A workflow is a YAML file in .github/workflows/. You declare what triggers it (on:), then one or more jobs, each a sequence of steps. Here is a complete CI workflow that lints, tests, and (only on the main branch) builds and pushes a Docker image.

.github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main]
  pull_request:           # also run on every PR

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node: [20, 22, 24]   # test across Node versions in parallel
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node }}
          cache: npm
      - run: npm ci
      - run: npm run lint
      - run: npm test -- --coverage

  build-image:
    needs: test                          # only after every test job is green
    if: github.ref == 'refs/heads/main'  # and only on main
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    steps:
      - uses: actions/checkout@v4
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - uses: docker/build-push-action@v6
        with:
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}

The test job runs on every push and PR, fanning out across three Node versions via the matrix. The build-image job declares needs: test, so it waits for all three matrix legs to pass, and the if: guard limits it to the main branch — PRs get tested but never publish an image. Note ${{ ... }} is GitHub Actions’ expression syntax (contexts, secrets, matrix values), and secrets.GITHUB_TOKEN is an auto-provisioned credential scoped by the permissions: block.

01 Learning objectives

0 / 2 done

02 Curated reading

03 Interview questions

browse all ↗

What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.

  • Commonly asked mid concept very common What is the difference between continuous integration, continuous delivery, and continuous deployment?

    Continuous integration (CI): developers merge to a shared branch frequently, and every push automatically builds and runs the test suite, so integration problems surface in minutes, not at a big-bang merge.

    Continuous delivery (CD): every change that passes CI is automatically built into a deployable, release-ready artifact and pushed through environments up to a staging gate — but the final push to production is a manual button.

    Continuous deployment: the same pipeline, with the manual gate removed — every change that passes all automated checks goes straight to production, no human in the loop. The distinction people get wrong is delivery (human approves the prod release) vs deployment (fully automated to prod).

    Red flag Using 'continuous delivery' and 'continuous deployment' interchangeably — the difference is whether a human approves the production release.

    source: GitHub docs — About continuous integration ↗
  • Commonly asked mid coding very common Write a basic GitHub Actions workflow that runs tests on every pull request. Explain the trigger, jobs, and steps.

    A workflow is YAML in .github/workflows/. The top-level on sets the trigger, jobs are units that run on a runner, and each job has steps.

    name: CI
    on:
    pull_request:
    branches: [main]
    jobs:
    test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
    with:
    node-version: 20
    - run: npm ci
    - run: npm test

    on: pull_request triggers on every PR to main; the single test job runs on a fresh Ubuntu runner; steps check out the code, set up Node, install deps deterministically with npm ci, and run the suite. Jobs run in parallel by default; needs: makes one wait on another.

    Red flag Forgetting `actions/checkout` (the runner starts empty, so the build has no source), or using `npm install` instead of `npm ci` so the lockfile is not respected and builds become non-reproducible.

    source: GitHub docs — Writing workflows / quickstart ↗
  • Commonly asked mid concept common Explain the typical stages of a CI/CD pipeline: build, test, deploy. What runs where?

    Build: compile/transpile, install dependencies, and produce a versioned, immutable artifact (a binary, a bundle, or — most commonly — a container image) pushed to a registry. The key principle is build once and promote that same artifact through every environment.

    Test: run fast unit tests first (fail early), then integration tests, then optionally end-to-end tests, plus quality and security scans (lint, SAST, dependency/vulnerability scan). Order from cheapest/fastest to slowest so the pipeline fails fast.

    Deploy: ship the already-built artifact to staging, run smoke tests, then promote to production with a rollout strategy (rolling/blue-green/canary) and health checks that can trigger automatic rollback. Building a fresh artifact per environment is the anti-pattern — you would no longer be testing what you ship.

    Red flag Rebuilding the artifact separately for staging and production — you then deploy something you never actually tested, defeating the point of the pipeline.

    source: GitHub docs — About continuous deployment ↗
  • Commonly asked senior concept common How do you handle secrets (API keys, deploy credentials) in a CI/CD pipeline?

    Never hardcode secrets in source, the workflow file, or build logs. Inject them at runtime from a secret store: GitHub Actions encrypted secrets / environments, or an external manager like HashiCorp Vault, AWS Secrets Manager, or a cloud key vault. The CI system makes them available as masked env vars so they do not print in logs.

    Stronger still: prefer short-lived, scoped credentials over long-lived static keys — for cloud deploys, use OIDC so the workflow exchanges its identity token for temporary cloud credentials, eliminating stored long-lived keys entirely. Scope secrets to the environment that needs them and gate production secrets behind required reviewers. And remember a secret echoed into a log or committed to git is compromised forever — rotate it.

    Red flag Putting credentials in the repo or in plain workflow env, or echoing a secret in a debug step — once it lands in git history or a log it must be treated as permanently compromised and rotated.

    source: GitHub docs — Using secrets in GitHub Actions ↗
  • Commonly asked senior concept common Compare blue-green and canary deployment strategies. When would you choose each?

    Blue-green runs two full environments: blue (current) serves all traffic while green (new) is deployed and verified, then you flip traffic to green at once. Rollback is instant — flip back to blue. Cost: you run double the infrastructure during the cutover, and a bad release hits 100% of users the moment you switch.

    Canary releases the new version to a small slice of traffic (say 5%), watches error rates and latency, then gradually ramps to 100%. It limits blast radius and catches problems with real traffic before everyone is exposed, but it is more complex (traffic splitting, automated metric analysis) and the rollout is slower.

    Pick blue-green when you want a clean, instant, all-or-nothing switch and can afford duplicate capacity; pick canary when blast-radius control matters and you have the observability to judge a partial rollout.

    Red flag Calling a deployment a 'canary' when there is no automated metric analysis gating the ramp — without watching error/latency on the small slice, you have just slowed down a full rollout, not limited blast radius.

    source: AWS — Blue/Green vs Canary deployment strategies ↗
  • Commonly asked senior debug common Your CI build passes locally but fails intermittently in the pipeline. How do you approach a flaky build?

    Flakiness almost always comes from hidden non-determinism. Hunt the usual sources: tests that depend on execution order or shared mutable state; reliance on real time/timezone, random seeds, or wall-clock sleeps instead of waiting on a condition; tests hitting real networks/external services; and concurrency races. The 'works locally' clue points at environment differences — different dependency versions, missing lockfile pinning, or fewer CPUs on the runner exposing a race.

    Approach: make it reproducible (run the suite repeatedly, randomize order, run in a clean container matching CI), then isolate the offending test and fix the root cause. Pin dependencies with a lockfile and npm ci, mock external calls, and replace sleeps with explicit waits. Blanket auto-retry hides flakes and erodes trust in the suite — fix, do not paper over.

    Red flag Slapping an automatic retry on the whole suite so red turns green — the underlying race or shared-state bug stays, and the team stops trusting CI failures.

    source: GitHub docs — Continuous integration concepts ↗
  • Commonly asked senior concept common What is a deployment gate / required approval, and where do manual gates belong in a pipeline?

    A gate is a condition that must pass before a stage proceeds — automated (tests green, security scan clean, smoke checks pass) or manual (a required human approval). In GitHub Actions you implement this with environments that have required reviewers and optionally a wait timer or branch restrictions; a job targeting that environment pauses until approved.

    Where gates belong: automated quality gates everywhere (fail fast on tests/lint/scans), and a manual approval only at the boundary you actually want a human to own — typically the promotion to production. That manual prod gate is exactly the line between continuous *delivery* (human approves prod) and continuous *deployment* (no gate). You also gate to protect the production *secrets/credentials*, which are scoped to that environment and unlocked only after approval.

    The senior framing: minimize manual gates (they create bottlenecks and false confidence) and lean on strong automated checks; reserve human approval for genuinely high-risk, irreversible promotions.

    What a strong answer covers
    • A gate blocks a stage until a condition passes — automated (tests/scans) or manual (approval).

    • GitHub Actions: environments with required reviewers / wait timer pause a job until approved.

    • Put automated gates everywhere (fail fast); reserve manual approval for the prod promotion.

    • That manual prod gate is the line between continuous delivery and continuous deployment.

    • Environment gates also protect prod secrets, unlocked only after the gate passes.

    Red flag Gating every stage with manual approvals 'to be safe' — it creates bottlenecks and rubber-stamp approvals; strong automated gates plus a single human gate at prod promotion is the better pattern.

    source: GitHub docs — Using environments for deployment ↗
  • Commonly asked mid concept common Why and how do you cache dependencies in CI? What's the difference between caching and an artifact?

    CI runners start clean every run, so without caching you re-download every dependency on each build — slow and wasteful. A dependency cache restores files like node_modules/~/.npm keyed on a hash of the lockfile (package-lock.json): a cache *hit* restores them in seconds; a cache *miss* (lockfile changed) rebuilds and saves a fresh cache. In GitHub Actions the setup-* actions can do this with one cache: line, or you use actions/cache directly.

    The distinction interviewers want: a cache is a build-time optimization — it is keyed, can be evicted, and you must never *depend* on it existing (a miss must still produce a correct build). An artifact is an *output* you deliberately persist — the built binary/image/test report you pass between jobs or download later. Cache = speed, may vanish; artifact = a result you must keep.

    Key the cache carefully: too broad and you serve stale deps; too narrow and you never hit it. Hashing the lockfile is the sweet spot.

    What a strong answer covers
    • Runners are ephemeral; caching avoids re-downloading deps every run.

    • Key the cache on a lockfile hash — hit restores fast, miss rebuilds and re-saves.

    • Cache = build-time speedup, evictable, must never be *required* for correctness.

    • Artifact = a deliberate output you persist (binary/image/report) and pass between jobs.

    • Bad cache keys cause stale dependencies (too broad) or constant misses (too narrow).

    Quick self-check

    What is the right cache key for a Node project's `node_modules` cache?

    Red flag Treating a cache like an artifact and depending on it being present, or keying it too loosely so a stale `node_modules` is restored after the lockfile changed — leading to 'works in CI but with old deps' bugs.

    source: GitHub docs — Caching dependencies to speed up workflows ↗
  • Commonly asked mid concept occasional How do you run the same CI job across multiple language versions or OSes efficiently?

    Use a build matrix. Instead of copy-pasting a near-identical job per Node version or OS, you declare a matrix and CI fans out one job per combination automatically, running them in parallel. In GitHub Actions:

    strategy:
    matrix:
    node: [18, 20, 22]
    os: [ubuntu-latest, windows-latest]

    That single job definition expands to 6 parallel jobs (3 versions × 2 OSes), each on its own runner. You can include/exclude specific combinations and set fail-fast (cancel the rest on first failure) on or off depending on whether you want full results.

    The value is coverage without duplication: test the support matrix you promise users, catch a version-specific break early, and keep the workflow DRY. The tradeoff is runner minutes — a wide matrix multiplies cost, so test the combinations that matter, not every permutation.

    What a strong answer covers
    • A matrix fans one job definition out into one parallel job per combination.

    • matrix: { node: [...], os: [...] } expands to the cross-product, each on its own runner.

    • include/exclude tune specific combos; fail-fast controls cancel-on-first-failure.

    • Gives coverage of your support matrix without duplicating job YAML.

    • Cost grows with the cross-product — test combinations that matter, not every permutation.

    Red flag Duplicating an entire job per version/OS instead of using a matrix — it's verbose, drifts out of sync, and you forget to update one copy; the matrix keeps all combinations defined in one place.

    source: GitHub docs — Running variations of jobs in a workflow (matrix) ↗
  • Commonly asked senior concept occasional Why is a fast CI feedback loop so important, and how do you keep a pipeline fast as it grows?

    The whole point of CI is fast feedback on whether a change is safe. A pipeline that takes 40 minutes breaks the developer's flow — they context-switch, stack up un-merged PRs, and start ignoring or working around the signal. Speed is what keeps CI trustworthy and keeps people integrating frequently.

    Keep it fast as it grows: parallelize (split the test suite across runners / use a matrix), fail fast by ordering cheap checks first (lint and unit tests before slow e2e), cache dependencies and build outputs, and only run what changed for large monorepos (path filters / affected-project detection). Build the artifact once and promote it rather than rebuilding per stage.

    The senior framing: treat pipeline duration as a product metric you budget and watch — when a stage gets slow, profile it like you would slow code. A flaky or slow pipeline is a tax on every single merge.

    What a strong answer covers
    • CI exists for fast feedback; a slow pipeline breaks flow and erodes trust in the signal.

    • Parallelize test suites and use matrices to spread work across runners.

    • Fail fast: cheap checks (lint, unit) before slow ones (integration, e2e).

    • Cache deps/build outputs and only run what changed in big monorepos.

    • Treat pipeline duration as a tracked metric — profile a slow stage like slow code.

    Red flag Letting pipeline time creep unbounded — once feedback takes tens of minutes, developers batch changes and stop trusting CI, which defeats the purpose of continuous integration entirely.

    source: GitHub docs — About continuous integration ↗
  • Commonly asked senior debug occasional A deploy to production succeeds but the app is broken; rolling back code didn't fix it. How do you reason about the failure and prevent it?

    First separate the layers: a 'green' deploy only means the *pipeline* succeeded, not that the app *works*. If rolling back the code didn't fix it, the breakage is almost certainly not in the code artifact — look at the things that aren't versioned with the image: a database migration that already ran (and is irreversible), a changed config/feature flag, a new infra/secret value, or a dependency/external service.

    The migration case is the classic trap: code rolls back instantly, but a schema change (dropped column, altered type) does not, so old code now hits an incompatible schema. The discipline is backward-compatible, expand-then-contract migrations — deploy schema changes that both old and new code can run against, ship code, then remove the old shape in a later release — so rolling back code is always safe.

    Prevention: add post-deploy smoke tests/health checks that gate the rollout (so a broken deploy auto-rolls-back before users see it), decouple migrations from code deploys, use feature flags to separate 'deployed' from 'released', and ensure rollbacks are actually tested, not assumed.

    What a strong answer covers
    • A green pipeline ≠ a working app — 'success' is about the deploy, not behavior.

    • If code rollback didn't help, the cause is unversioned state: migrations, config, flags, secrets, deps.

    • Irreversible DB migrations are the classic trap — code reverts, schema doesn't.

    • Fix with expand-then-contract backward-compatible migrations so rollback is always safe.

    • Prevent with post-deploy smoke tests that gate/auto-rollback, plus feature flags to separate deploy from release.

    Red flag Assuming a code rollback always restores a known-good state — irreversible schema migrations and out-of-band config changes aren't part of the artifact, so the rollback leaves old code running against a changed world.

    source: GitHub docs — About continuous deployment ↗
  • Commonly asked senior concept occasional Why is trunk-based development paired with feature flags so common in CI/CD, and what problem does it solve over long-lived branches?

    Long-lived feature branches drift away from main for days or weeks, so when they finally merge you get merge hell — big, painful, conflict-ridden integrations exactly when you can least afford surprises. That defeats the 'continuous' in continuous integration, whose whole premise is integrating *frequently* so problems surface in small, cheap increments.

    Trunk-based development has everyone commit small changes to main (or very short-lived branches merged within a day), keeping the branch always releasable. The obvious tension: how do you merge unfinished work without shipping it? Feature flags — you merge the code behind an off-by-default flag, so it's integrated and tested continuously but invisible to users until you flip it on. This also decouples deploy from release: deploying code and exposing a feature become separate decisions, enabling canary/gradual rollouts and instant kill-switches.

    Senior framing: small frequent merges + flags keep integration cheap and continuous and make release a runtime toggle rather than a deployment event — at the cost of flag hygiene (you must clean up stale flags).

    What a strong answer covers
    • Long-lived branches drift from main → painful big-bang merges that defeat continuous integration.

    • Trunk-based: small frequent commits to main, kept always releasable.

    • Feature flags let you merge unfinished work off-by-default — integrated and tested, not yet exposed.

    • Flags decouple deploy from release: shipping code and turning a feature on are separate decisions.

    • Enables canary/gradual rollout + instant kill-switch; cost is flag hygiene (remove stale flags).

    Red flag Sitting on a long-lived branch 'until the feature is done' — it diverges from main and turns into a high-risk merge; the CI premise is to integrate small changes continuously, using flags to hide the unfinished parts.

    source: GitHub docs — About continuous integration ↗