> cs·fundamentals
interview 0% 28m read
7.7 ★ core [B][J] 10 interview Q's

Working with AI agents

The workflow and vocabulary for getting good output from a coding agent — context, prompting, plan-then-code, the edit-test-review loop, and the safety rails.

A coding agent isn’t a search box — it’s a teammate that can read your files, write code, run it, and keep going until a task is done. Getting good work out of it is a skill: feed it the right context, give it a real task with an acceptance check, and review what it produces instead of trusting it. This chapter is that workflow.

What the agent can actually see

The single biggest lever on output quality is context. The agent reasons over what’s in its window; everything outside it might as well not exist. Point it at the right files and examples and the work gets sharp; leave it guessing and it drifts. The window is also finite — you can’t pour the whole repo in — so the skill is curation: the few files that matter, an example to mirror, and the one rule that constrains the change. Stale or irrelevant context is worse than none; it actively misleads.

A large ‘context window’ box contains chips for instructions, open files, and recent messages; outside sits a greyed ‘rest of the repo’ the agent can’t see.context window — what it seesyour instructionsfiles you openedrecent messagesan example / screenshotrest of the repo(not seen)
FIG 1 · the context window The agent reasons over what's inside the window. The rest of the repo is invisible until you point at it.

The loop

Good agent work runs a loop you stay inside: plan, edit, test, review the diff, commit. Skipping the review step is how plausible-but-wrong code lands in your repo. You own the two ends — the plan (is this the right approach?) and the review (is every change correct and wanted?); the agent owns the middle — the typing and the running. Keep the loop small: one focused task per pass, not a sprawling “build the whole feature” that’s impossible to review.

A cycle of five stages: Plan, Edit, Test, Review diff, Commit, with the loop returning from Review to Plan.PlanEditTestReview diffCommitmore to do? loop back
FIG 2 · the agent loop Plan → Edit → Test → Review the diff → Commit. You own the plan and the review; the agent owns the edit and the run.
StageYouThe agentHow you verify
Planstate goal + constraintsproposes an approachdoes the plan match your intent?
Editpoint at the right fileswrites the changes
Testsay how to check itruns tests / the appdo they pass?
Reviewread the diffexplains what changedis every change correct + wanted?
Commitapprovewrites the messagesmall, focused commit?
You own the plan and the review; the agent owns the typing and the running.

A good task prompt

The difference between a frustrating agent and a great one is usually the prompt. Give it a goal, constraints, an example, and an acceptance check. The acceptance check is the part people skip and the part that matters most: it turns “looks done” into “is done,” and it gives the agent a target to test itself against.

Weak prompt → strong prompt
WEAK:  "Add search to the app."

STRONG: "Add a search box to the products page (src/pages/products.astro).
 - Filter the existing product list by name, case-insensitive, as the user types.
 - Reuse the <Input> component in src/components/; match the dark style.
 - No new dependencies.
 Done when: typing 'mouse' shows only matching products and clearing restores all.
 Show me the plan before editing."

The strong version names where, gives constraints (reuse, no deps), and states an acceptance check. “Show me the plan first” lets you catch a wrong approach before any code is written — that’s what plan mode is for.

Reading a diff like you mean it

“Verify, don’t trust” is the whole game. An agent’s code can be confidently wrong — syntactically perfect, plausible, and subtly broken. Reading the diff is how you catch it. Don’t skim for green; read for intent: Does each changed line do what the task needed? Did it touch files it shouldn’t? Did it quietly delete a check, widen a type to any, or hard-code a value? If you can’t tell what a hunk is for, ask the agent to explain that hunk before you accept it.

A diff worth a second look
  app.post("/api/todos", async (req, res) => {
-   const user = await requireAuth(req);
    const { title } = req.body;
    const todo = await db.todo.create({ data: { title, userId: req.body.userId } });
    res.status(201).json(todo);
  });

It “works” and the tests might even pass — but the agent removed the auth check and trusted req.body.userId from the client. That’s an authorization hole (see 7.3) hiding in a green diff. The review step exists to catch exactly this.

The customization surfaces

Once the basics click, a handful of surfaces let you shape how the agent works so you stop repeating yourself. You don’t need all of them on day one — but knowing they exist tells you what’s tunable.

SurfaceWhat it isReach for it when
Custom instructionsa CLAUDE.md/AGENTS.md the agent always loadsyou keep repeating conventions/rules
Slash commandssaved, reusable prompts you trigger by namea multi-step task you run often
MCP / toolsextra capabilities — DB, API, docs, browserthe agent needs to reach beyond your files
Subagentsa fresh agent for a scoped sub-taska big job splits into independent pieces
Plan modethe agent plans but makes no edits yetrisky/large changes you want to vet first
Surfaces for shaping the agent. Standing rules in CLAUDE.md; vet risky work in plan mode.

01 Learning objectives

0 / 6 done

02 Curated reading

03 Knowledge check

knowledge check4 questions · pass ≥ 70%
  1. 01easy

    Before committing AI-generated code, the essential step is to…

  2. 02easy

    Pasting API keys into the chat so the agent can use them is fine.

  3. 03medium

    The “context window” is…

  4. 04medium

    A strong task prompt for an agent includes… (select all)

04 Interview questions

browse all ↗

What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.

  • Commonly asked junior concept common What is the context window, and why does it shape how you work with a coding agent?

    The context window is everything the model can 'see' at once — the system prompt, your instructions, the files and snippets you've shared, and the conversation so far, all measured in tokens with a hard limit.

    It shapes your workflow because the agent can't reason about code it hasn't been shown, and stuffing in irrelevant files wastes the budget and dilutes attention. So you deliberately feed it the right files, an example, and the acceptance criteria — and start fresh when a long thread gets noisy.

    Red flag Assuming the agent 'remembers' your codebase — it only knows what's in the current context window; share the relevant files.

    source: Anthropic — Claude Code best practices ↗
  • Commonly asked junior concept common Describe a healthy loop for working with a coding agent on a real change.

    Plan → edit → test → review the diff → commit. First have it lay out a plan and agree on it before any code (plan mode helps). Then let it edit, run the tests, and crucially read the diff yourself before accepting — verify, don't trust. Commit in small, reviewable chunks.

    The discipline is treating the agent like a fast junior pair: you still own the review and the commit. Small loops with verification beat one giant unreviewed change.

    Red flag Accepting a large change wholesale without reading the diff — bugs and unintended edits slip through unverified.

    source: Anthropic — Claude Code best practices ↗
  • Commonly asked junior concept common What makes a strong task prompt for an agent?

    Four parts: the goal (what done looks like), the constraints (don't touch X, use library Y, match this style), an example (an existing pattern to follow or sample input/output), and an acceptance check (the test or command that proves it works).

    This turns a vague wish into a spec the agent can hit and you can verify. The acceptance check is the part people skip — without it neither you nor the agent knows when it's actually done.

    Red flag Giving only the goal ('add search') with no constraints, example, or check — you get plausible code that may miss the point.

    source: Anthropic — Prompt engineering overview ↗
  • Commonly asked mid concept occasional What safety rails do you keep in mind when letting an agent run in your repo?

    Core rails: never commit secrets (and don't let the agent paste keys into code or logs), review before running anything it generates — especially shell commands and migrations, watch cost (long autonomous runs burn tokens), and slow down on risky changes (deletes, schema migrations, anything touching prod or auth).

    The mindset: the agent is fast and confident but not accountable — you are. Treat its output as a proposal to verify, not a command to execute blindly.

    Red flag Granting blanket auto-run on everything — a confidently wrong destructive command (a bad `rm` or migration) can do real damage.

    source: Anthropic — Claude Code best practices ↗
  • Commonly asked junior concept occasional What are custom instructions like CLAUDE.md / AGENTS.md for?

    They're a persistent, project-level brief the agent reads automatically — conventions, commands, architecture notes, do's and don'ts — so you don't re-explain them every session. They put durable context into the window without you pasting it each time.

    They're one of several customization surfaces, alongside slash commands (reusable prompts), MCP/tools (giving the agent new capabilities), subagents, and plan mode. The instruction file is the cheapest, highest-leverage one to start with.

    Red flag Letting the file rot — stale instructions actively mislead the agent; treat it as living documentation.

    source: Anthropic — Claude Code best practices ↗
  • ★ must-know Commonly asked junior concept common Why treat a prompt to a coding agent like a spec rather than a casual request?

    An agent does exactly what you describe, not what you meant — it has none of the shared context a teammate would fill in. A casual request ('add login') leaves a hundred decisions to chance: which auth method, where state lives, what the error states are, what 'done' means. The agent picks plausible answers, and you discover the gaps afterward.

    Treating the prompt as a spec front-loads those decisions: state the goal, the constraints, an example to match, and the acceptance check. This is the same discipline as writing a ticket for a junior dev. The clearer the spec, the less re-work — vague prompts don't save time, they move the cost to debugging plausible-but-wrong output.

    What a strong answer covers
    • The agent does what you say, not what you meant — it lacks your unstated context.

    • A casual ask leaves many decisions to chance; the agent guesses, you find gaps later.

    • A spec front-loads: goal, constraints, example, acceptance check — like a good ticket.

    • Vagueness doesn't save time; it relocates the cost to debugging wrong output.

    Quick self-check

    Which prompt is most likely to produce code you can ship with minimal rework?

    Red flag Firing off a one-line wish and expecting the agent to infer your conventions, edge cases, and definition of done — it can't; it fills gaps with guesses.

    source: Anthropic — Prompt engineering overview ↗
  • Commonly asked mid debug common A long agent session starts making mistakes, contradicting earlier decisions, and 'forgetting' things. What's happening and what do you do?

    Long sessions degrade because the context window fills with accumulated history — old turns, dead ends, large file dumps — which both crowds out room for new work and dilutes the model's attention across noise. The earlier 'decisions' may have scrolled out of effective focus, so it drifts.

    The fix is to manage context deliberately: start a fresh session for a new sub-task, re-state the current goal and the few decisions that still matter, and re-share only the relevant files rather than the whole accumulated thread. Capture durable decisions in a project instruction file (CLAUDE.md) or a short summary you can paste back, so resetting the session doesn't lose them. Short, focused contexts beat one ever-growing thread.

    What a strong answer covers
    • Cause: the context window fills with history/noise, crowding and diluting attention.

    • Fix: start fresh, re-state the goal and the decisions that still matter.

    • Re-share only relevant files, not the entire accumulated conversation.

    • Persist durable decisions (instruction file / summary) so a reset loses nothing.

    Red flag Pushing through in the same bloated thread, repeating yourself — the noise is the problem; a clean context with a crisp restatement works far better.

    source: Anthropic — Claude Code best practices ↗
  • Commonly asked mid concept occasional What kinds of tasks should you NOT hand to an agent autonomously, and why?

    Avoid full autonomy where a confident mistake is expensive or irreversible: destructive operations (deletes, rm, dropping data), database schema migrations, anything touching production, security-sensitive code (auth, permissions, payment), and broad sweeping changes you can't easily review. These share a trait — the cost of being wrong is high and recovery is hard.

    The principle is *cost of error*. Where errors are cheap and caught by tests (a new pure function, a localized UI tweak), let the agent run. Where errors are catastrophic or hard to undo, keep a human in the loop: require confirmation, work on a branch, review the diff and the exact commands before they execute. Match autonomy to reversibility.

    What a strong answer covers
    • Hold back autonomy on destructive ops, migrations, prod changes, and auth/payment code.

    • Common thread: high cost of error and hard to undo.

    • Where errors are cheap and tests catch them, more autonomy is fine.

    • Match the autonomy you grant to how reversible the change is.

    Red flag Granting blanket auto-approval so a confidently wrong destructive command (a bad migration or `rm`) executes before any human sees it.

    source: Anthropic — Claude Code best practices ↗
  • Commonly asked junior concept common Why is 'review the diff before you accept it' the non-negotiable habit when working with an agent?

    An agent is fast and confident but not accountable — you are. It can make changes beyond what you asked (touching unrelated files, deleting code it deemed unnecessary, introducing a subtle bug) and it states all of it with equal confidence. The diff is your checkpoint: it shows exactly what changed before it becomes part of your code.

    Reading the diff is also where *you* stay in control of the codebase — you keep understanding what's in it, catch scope creep, and verify the change actually does what the spec asked. 'Verify, don't trust' is the whole posture. Skipping the diff is how unreviewed bugs and unintended edits slip into a repo nobody fully understands anymore.

    What a strong answer covers
    • The agent is fast and confident but not accountable — the human is.

    • It can make changes beyond the ask; the diff exposes exactly what changed.

    • Reviewing keeps you in command of the codebase and catches scope creep.

    • 'Verify, don't trust' — the diff is the checkpoint before code is yours.

    Red flag Accepting big changes blind because they 'look right' — confident, plausible code can carry unrelated edits and subtle bugs that only a diff review surfaces.

    source: Anthropic — Claude Code best practices ↗
  • Commonly asked mid debug occasional An agent keeps failing to fix a bug, trying variation after variation. How do you break the loop?

    Thrashing usually means the agent is missing something it needs, not that it needs more attempts. Stop and give it more or better context: the exact error message and stack trace, the relevant file it hasn't seen, how to reproduce, and what you've already ruled out. Often it's been guessing because the failing piece was never in its window.

    If that doesn't help, change the approach: ask it to first explain its diagnosis and a plan before editing (so you can catch a wrong mental model), narrow the task, or reset the session to clear accumulated wrong turns. And know when to take over — for a tricky bug a human read of the actual error often beats a tenth blind attempt. More tries on the same starting context rarely converges; better context or a reset does.

    What a strong answer covers
    • Thrashing = missing context, not too few attempts.

    • Feed it the exact error, stack trace, repro steps, and what's been ruled out.

    • Make it state a diagnosis/plan before editing to expose a wrong mental model.

    • Reset the session to clear bad turns; know when to take over yourself.

    Red flag Repeatedly saying 'still broken, try again' on the same context — without new information the agent just cycles plausible guesses; add context or reset.

    source: Anthropic — Claude Code best practices ↗