Code quality

Code-review practices (small PRs, design not just bugs) and clean-code/refactoring principles.

Code quality is mostly a people and process discipline, not a tooling one. The two levers with the highest payoff are how you run code review and how you treat refactoring — both aimed at the same goal: keeping the codebase cheap to change as it grows. Code is read far more often than it’s written, and a team’s velocity a year in is set less by how fast it types than by how cheaply it can change what’s already there.

Key vocabulary

Code review: A second engineer reads a change before it merges — catching bugs, but more importantly checking design, sharing context, and keeping the codebase coherent.
Small PR: A pull request scoped to one logical change, small enough to review carefully in one sitting. Reviewers reliably find issues in small diffs and rubber-stamp large ones.
Refactoring: Changing the internal structure of code without changing its external behavior — to make it clearer or easier to extend. Behavior-preserving by definition; tests should still pass unchanged.
Boy-scout rule: "Leave the code a little cleaner than you found it." Small, opportunistic improvements as you pass through, so quality trends up instead of decaying.
Technical debt: The future cost of a shortcut taken now — like financial debt, it accrues interest (every later change is slower) until you "pay it down" by refactoring.
Clean code: Code optimized for the reader: intention-revealing names, small single-purpose functions, minimal surprise. Written once, read many times.

Code review: design, not just bugs

The biggest mistake teams make is treating review as bug-hunting. Linters and tests already catch most mechanical issues; a human reviewer’s unique value is judging things a machine can’t — is this the right design? does it fit the existing patterns? will the next person understand it? is there a simpler approach? Bugs are the floor of review, not the ceiling.

Reviewing for	Example question	Could a tool catch it?
Design / approach	Is there a simpler structure? Does this belong in this layer?	No — this is the human's main job
Readability	Will someone understand this in 6 months without the author?	Mostly no
Correctness / edge cases	What about empty input, concurrency, the error path?	Partly (tests)
Style / formatting	Indentation, import order, naming conventions	Yes — automate it, don't review it

Let tools handle style; spend human review on design and readability.

Why small PRs get real reviews

Reviewer attention doesn’t scale with diff size — it collapses. A small diff gets a line-by-line read; a giant one gets skimmed and rubber-stamped, precisely when the stakes are highest.

FIG 1 · review quality vs PR size Real scrutiny holds for small diffs and falls off a cliff as they grow — a 2,000-line PR gets less review, not more.

Why small PRs get real reviews

Reviewer attention doesn’t scale with diff size — it collapses.

PR size            Typical review outcome
─────────────────  ────────────────────────────────────────────
~50 lines          line-by-line read; real design feedback
~200 lines         still careful; most issues caught
~500 lines         skimming; only obvious bugs flagged
2,000+ lines       "LGTM"  ← a rubber stamp, not a review

A giant PR doesn’t get more scrutiny because it’s important — it gets less, because the reviewer can’t hold it all in their head. Splitting the same work into four small, independently reviewable PRs gets each one actually read. Small PRs also merge faster, conflict less, and are far easier to revert if something goes wrong.

Clean code and refactoring

Clean code is written for the next reader. The highest-leverage habits are concrete: intention-revealing names (a good name removes the need for a comment), small functions that do one thing (so they fit in your head and can be named precisely), and avoiding surprise (a function should do what its name says and nothing more). You write a line once and read it dozens of times — optimize for the reading.

A behavior-preserving refactor (tests stay green)

The original works, but a tangle of conditions hides the intent and the magic numbers explain nothing:

// BEFORE: what does any of this mean?
function p(u, t) {
  if (u.s === 1 && t > 100) return t * 0.9;
  if (u.s === 1) return t * 0.95;
  return t;
}

Refactor to reveal intent — no behavior changes, so the existing tests must still pass unchanged:

// AFTER: same outputs, but the rules are now readable
const PREMIUM = 1;
const BULK_THRESHOLD = 100;

function priceForUser(user, total) {
  if (user.status !== PREMIUM) return total;
  const discount = total > BULK_THRESHOLD ? 0.10 : 0.05;
  return total * (1 - discount);
}

This is the discipline in miniature: clearer names, named constants instead of magic numbers, one early return for the non-premium case — and crucially, the outputs are identical. The test suite is the proof: if a single test changes, it wasn’t a refactor, it was a behavior change wearing a refactor’s clothes.

Refactoring is changing internal structure without changing behavior — by definition, the tests should pass before and after. That’s what makes it safe: a solid test suite is the safety net that lets you restructure aggressively. The senior judgment is when: not a grand “rewrite week,” but the boy-scout rule applied continuously (tidy the bit you’re touching) and a deliberate refactor when a change becomes painful — when adding the next feature is fighting the current structure, that friction is the signal to restructure first, then add the feature cleanly.

01 Learning objectives

0 / 2 done

02 Interview questions

browse all ↗

What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.

Commonly asked senior concept common What makes a good code review, and what should reviewers actually look for?
A good review judges whether the change improves the overall health of the codebase — not whether it is perfect. Reviewers look for, roughly in priority order:
- Design: does the change belong here, fit the architecture, and not over-engineer?
- Correctness & edge cases: logic, error handling, concurrency, security.
- Tests: do they exist and actually exercise the behavior?
- Naming, clarity, comments: will the next reader understand it?
- Consistency with project conventions.
Process matters too: keep PRs small (faster, deeper reviews), review promptly to unblock people, comment kindly and explain the 'why', and distinguish blocking issues from optional nits (label them). The goal is shared understanding and a healthier codebase, not gatekeeping.
Follow-ups they push on
- Why are small PRs reviewed better than large ones?
- How do you give critical feedback without demoralizing the author?
Red flag Reviewing only for style/formatting (which a linter should catch) while rubber-stamping the design — the expensive bugs live in design and edge cases.
source: Google — Code Review Developer Guide (What to look for) ↗
Commonly asked mid concept common What is refactoring, and when is the right time to do it?
Refactoring is changing the *internal structure* of code to make it easier to understand and cheaper to modify, without changing its observable behavior. The behavior-preserving part is what makes it safe — and why a solid test suite is its prerequisite.
When: not as a separate 'refactoring sprint' but continuously, woven into feature work. The pragmatic trigger is the rule of three / refactor-when-it-hurts — when you are about to add a feature and the existing design fights you, first refactor to make the change easy, then make the easy change. Plus the boy-scout rule: leave each file a little cleaner than you found it.
Follow-ups they push on
- Why is refactoring without tests dangerous?
- What's the difference between refactoring and rewriting?
Red flag Calling any code change 'refactoring' even when it alters behavior — that conflation is how 'refactors' sneak in bugs and scope creep.
source: Martin Fowler — Refactoring ↗
Commonly asked junior concept common What does 'clean code' mean to you? Name a few concrete principles.
Clean code is code optimized for the *reader*, since code is read far more than it is written. Concrete principles:
- Intention-revealing names — a name should say what something is/does so you don't need a comment to explain it.
- Small, single-purpose functions — one level of abstraction, do one thing.
- DRY — don't duplicate knowledge; but don't abstract prematurely either.
- **Comments explain *why*, not *what* — the code shows what; comments justify non-obvious decisions.
- Consistent style** — let formatters/linters handle it so reviews focus on substance.
The through-line: minimize the cognitive load on the next person (often future-you).
Follow-ups they push on
- When does DRY go too far and create the wrong abstraction?
- Why is a comment that restates the code a smell?
Red flag Reciting buzzwords (DRY, SOLID) without the underlying goal — readability and changeability — or over-applying DRY into a tangled wrong abstraction.
source: Martin Fowler — Two Hard Things (naming) / CodeAsDocumentation ↗
Commonly asked senior concept common What is technical debt, and how do you decide whether to pay it down?
Technical debt is the implied future cost of choosing an easy-now solution over a better-but-slower one — like financial debt, it accrues 'interest' as every future change in that area takes longer.
Fowler's quadrant is useful: debt can be *deliberate or inadvertent* and *prudent or reckless*. Deliberate-prudent debt ('we'll ship now and refactor next sprint, and we know the tradeoff') is a legitimate engineering decision; reckless debt ('what's layering?') is not.
Deciding to pay it down: prioritize debt in code you touch often (high interest) over dead corners; pay it down opportunistically as you work nearby (boy-scout rule) rather than via giant rewrites; and make the cost visible to stakeholders so it competes fairly with features.
Follow-ups they push on
- Why is debt in rarely-touched code often fine to leave?
- How do you make tech debt visible to non-engineering stakeholders?
Red flag Treating all tech debt as equally urgent (or all of it as 'just bad code') — debt in hot paths costs far more than debt in stable, untouched code.
source: Martin Fowler — Technical Debt Quadrant ↗
Commonly asked mid concept occasional What's the difference between a linter and a formatter, and why automate both?
A formatter (Prettier, gofmt, Black) rewrites code to a canonical *style* — indentation, quotes, line length. It is purely cosmetic and deterministic.
A linter (ESLint, Ruff, golangci-lint) analyzes code for *problems and smells* — unused variables, likely bugs, anti-patterns, sometimes security issues. It catches substance, not just style.
Automate both, ideally in pre-commit hooks and CI, because it removes whole categories of nit-picking from human review. When formatting and trivial issues are settled by tools, reviewers spend their attention on design and correctness — the things only humans can judge. It also keeps style consistent regardless of who wrote the code.
Follow-ups they push on
- Why run these in CI even if developers have editor integration?
- How does auto-formatting reduce diff noise in code review?
Red flag Conflating the two, or relying on humans to enforce style in review — that wastes reviewer attention on what a tool should settle automatically.
source: Prettier — Prettier vs. Linters ↗
Commonly asked senior concept occasional Name a few code smells and explain what each one signals.
A code smell is a surface symptom that *hints* at a deeper design problem — not a bug, but a prompt to look closer. Common ones:
- Long method / large class: too many responsibilities; signals a need to extract functions/classes.
- Duplicated code: the same knowledge in many places — change one, miss the others (DRY violation).
- Long parameter list: often a missing object that should group related params.
- Feature envy: a method that mostly uses *another* object's data — behavior is in the wrong place.
- Shotgun surgery: one change forces edits across many files — poor cohesion.
- Primitive obsession / magic numbers: missing a domain type or named constant.
The value is that smells give a shared vocabulary for review and point toward the right refactoring — but they are heuristics, not hard rules.
Follow-ups they push on
- Why is a smell a *hint* rather than a definitive 'this is wrong'?
- Which refactoring addresses 'shotgun surgery'?
Red flag Treating every smell as a mandatory fix — sometimes the 'smelly' code is the pragmatic choice; smells prompt investigation, not reflexive rewrites.
source: Martin Fowler — CodeSmell ↗
Commonly asked mid concept occasional What does cyclomatic complexity measure, and why is high complexity a problem?
Cyclomatic complexity counts the number of independent paths through a piece of code — essentially one plus the number of decision points (if, for, while, case, &&/||, ?:). A straight-line function is 1; each branch adds a path.
Why it matters: it correlates with how hard the code is to understand, test, and maintain. It's also a lower bound on the number of test cases needed to cover every path — a function with complexity 15 needs at least 15 paths exercised to test thoroughly, which is a strong hint it's doing too much. High complexity concentrates risk: the more tangled the branching, the more places a bug can hide.
Use it as a heuristic flag, not a hard law — a high score points you at a function worth simplifying (extract method, replace nested conditionals with guard clauses or polymorphism), but a naturally branchy dispatch can be legitimately high. Linters can fail a build over a threshold to keep it visible.
What a strong answer covers
- Measures independent paths ≈ 1 + count of decision points (branches/loops/boolean operators).
- Higher = harder to understand, test, maintain; it's a lower bound on test cases needed for path coverage.
- A high score flags a function doing too much — a candidate for extract-method / guard clauses.
- It's a heuristic, not gospel — some dispatch logic is legitimately branchy.
Quick self-check
What does a high cyclomatic complexity number most directly indicate?
Follow-ups they push on
- Why is cyclomatic complexity a lower bound on the number of tests for full path coverage?
- Which refactorings most directly reduce a function's complexity score?
Red flag Treating a complexity threshold as an absolute rule — it's a signal to investigate, and gaming the number (splitting one clear function into confusing fragments) can hurt readability more than it helps.
source: NIST — Cyclomatic Complexity (Structured Testing) ↗
Commonly asked senior concept common What are coupling and cohesion, and why do we want low coupling and high cohesion?
Cohesion is how strongly the things *inside* a module belong together — high cohesion means a module has one clear, focused responsibility. Coupling is how dependent modules are on *each other's* internals — low coupling means modules interact through small, stable interfaces and can change independently.
We want high cohesion, low coupling because together they localize change. With high cohesion a single concern lives in one place (you know where to look, and the change stays contained). With low coupling, changing one module doesn't ripple into others. The opposite — low cohesion, high coupling — produces the shotgun surgery smell (one change forces edits everywhere) and fragile code where a tweak in module A mysteriously breaks module B.
This is the engine behind modularity, SOLID's single-responsibility and dependency-inversion principles, and why you depend on interfaces rather than concrete implementations.
What a strong answer covers
- Cohesion = how well a module's internals belong together (want it high — one responsibility).
- Coupling = how much modules depend on each other's internals (want it low — small stable interfaces).
- Together they localize change: a concern lives in one place and edits don't ripple outward.
- Low cohesion + high coupling → shotgun surgery and fragile, change-resistant code.
Follow-ups they push on
- How does depending on an interface instead of a concrete class reduce coupling?
- Which code smell is the direct symptom of low cohesion across modules?
Red flag Optimizing one in isolation — e.g. splitting code into many tiny modules can lower per-module size while *raising* coupling (everything calls everything); you want both directions right together.
source: Martin Fowler — Reducing Coupling (Beck Design Rules) ↗
Google senior concept common How do you give code-review feedback that improves the code without alienating the author?
Review the code, not the person, and assume competence — phrase comments about the change ('this query runs N+1') rather than the author ('you always...'). Explain the why behind a suggestion so it teaches rather than dictates, and prefer asking ('what happens if items is empty here?') over commanding when you're unsure.
Distinguish blocking issues from preferences: label optional suggestions explicitly (Google's convention is prefixing nits with Nit:) so the author knows what must change versus what's taste. Don't let perfect block good — if a change improves the codebase's health overall, approve it even if it isn't exactly how you'd write it; file follow-ups for non-urgent improvements.
Process courtesies matter too: review promptly to avoid blocking people, keep feedback respectful and specific, and recognize good work, not just problems. The goal is a healthier codebase and a team that *wants* their code reviewed.
What a strong answer covers
- Critique the code, not the person; assume competence and explain the why.
- Label nits/optional suggestions vs blocking issues so the author knows what's required.
- Don't gatekeep on perfection — approve net improvements; file follow-ups for the rest.
- Review promptly and respectfully; the aim is codebase health *and* a team that welcomes review.
Follow-ups they push on
- Why is prefixing optional comments with 'Nit:' valuable to the author?
- When should a reviewer approve a change that isn't exactly how they'd write it?
Red flag Blocking a net-positive change over personal style preferences, or phrasing feedback as commands/attacks — it breeds resentment and slows the team without improving the code.
source: Google — Code Review Developer Guide (How to comment) ↗
Commonly asked senior trick occasional The team wants to stop feature work for a 'big rewrite' to fix the messy codebase. What's your take?
Push back. Big-bang rewrites are notoriously risky: you spend months reproducing existing behavior (including the undocumented edge cases the old code quietly handles), ship no value during the freeze, and often discover the new system has its own mess by the time it's done — the famous 'second-system' trap. Meanwhile the business is frozen and a parallel old-vs-new maintenance burden appears.
The pragmatic alternative is incremental refactoring under a green test suite, often via the Strangler Fig pattern: build the new behavior around the edges of the old system, route traffic to it piece by piece, and retire the old parts gradually — delivering value continuously and keeping rollback cheap. Pay down debt where you're already working (boy-scout rule) and where it has the highest interest.
A rewrite is occasionally justified (the platform is truly dead, or constraints changed fundamentally), but the default answer is: refactor incrementally, keep shipping, and make the cost of debt visible so it's prioritized — not a heroic stop-the-world bet.
What a strong answer covers
- Big-bang rewrites freeze value delivery and must re-derive every undocumented edge case the old code handles.
- They invite the second-system effect and a long old-vs-new dual-maintenance period.
- Prefer incremental refactoring behind tests, e.g. the Strangler Fig — replace piece by piece, keep shipping.
- Pay down debt where you already work and where interest is highest; make the cost visible.
Quick self-check
What's the strongest argument against a big-bang rewrite of a working legacy system?
Follow-ups they push on
- What is the Strangler Fig pattern and how does it de-risk replacing a legacy system?
- What rare conditions actually justify a full rewrite over incremental refactoring?
Red flag Defaulting to a stop-the-world rewrite — it usually overruns, loses hard-won edge-case behavior, ships nothing for months, and lands you with a new mess; incremental refactoring is the lower-risk path.
source: Martin Fowler — StranglerFigApplication ↗
Google senior concept occasional How do you keep code review and quality gates from becoming a bottleneck that slows the team down?
The dominant lever is small changes. A small PR is reviewed faster, more thoroughly, and merges before it rots; large PRs sit for days, get rubber-stamped, and block their authors. Google's guidance is explicit that small CLs are central to fast, high-quality review.
Reduce the human cost by automating what doesn't need judgment: formatters and linters settle style, CI runs the tests, security/dependency scanners flag the obvious — so reviewers spend their limited attention on design and correctness, not whitespace. Set an SLA for review turnaround (review promptly so authors aren't blocked) and make review a first-class part of the day, not an interruption deferred indefinitely.
Also right-size the gate: not every change needs the same rigor, and 'don't let perfect be the enemy of good' — approve net improvements and file follow-ups. The goal is a fast, trustworthy pipeline, not maximal ceremony.
What a strong answer covers
- Small PRs are the biggest lever — faster, deeper review; large PRs block authors and get rubber-stamped.
- Automate the judgment-free checks (lint/format/tests/scanners) so humans review design and correctness.
- Set a review-turnaround SLA and treat review as first-class work, not a deferred interruption.
- Right-size rigor and approve net improvements with follow-ups — don't let perfect block good.
Follow-ups they push on
- Why does a small PR get a higher-quality review than a large one?
- Which checks should never reach a human reviewer at all?
Red flag Trying to fix slow reviews by lowering standards or skipping review — the real fixes are smaller changes, automation of trivial checks, and a turnaround SLA, which speed things up *without* sacrificing quality.
source: Google — Code Review Developer Guide (Small CLs) ↗