The AI coding toolbox
A taxonomy of AI coding tools — autocomplete, chat, terminal agent, AI IDE, app-builder — so you reach for the right one, plus a non-stale way to think about model tiers.
“AI coding tool” covers five genuinely different things, from a tool that finishes your line to one that builds a whole app from a sentence. Pick the wrong category and you’ll either micromanage a powerful agent or wait on a chat window for something an autocomplete would’ve done instantly. This chapter is the taxonomy — categories over brand names, because the brands churn.
A spectrum of autonomy
The categories line up by how much they do on their own — from suggesting a line to producing a whole app. More autonomy means more leverage and more to review. Think of it as a dial: at the left you’re the author and the tool is your faster fingers; at the right the tool is the author and you’re the editor. Neither end is “better” — they’re for different jobs.
| Category | What it is | Reach for it when | Watch out |
|---|---|---|---|
| Autocomplete | inline next-line suggestions | you're already coding and know the shape | accepts plausible-but-wrong lines |
| Chat assistant | ask → snippet → paste | explaining code, a one-off function, learning | no repo context unless you paste it |
| AI IDE | agent inside the editor | multi-file changes you want to watch | easy to over-trust the auto-edits |
| Terminal agent | edits, runs, tests, loops | real tasks across a repo, hands-off | give it scope + review the diff |
| App-builder | prose → deployable app | a prototype or MVP from scratch | great start, then you own the code |
Models: think in tiers, not names
Behind these tools sit frontier models, and the names change every few months. What doesn’t change is the shape: a fast/cheap tier, a balanced everyday tier, and a most-capable tier for hard reasoning. Match the tier to the task, not the hype — bigger isn’t always better, and it costs more and runs slower.
The cost / capability / latency tradeoff
Picking a tier is a real engineering decision, not a “always use the smartest one” reflex. A bigger model costs more per call and is slower to respond. For a task you run ten thousand times a day — tagging support tickets, completing a line — the fast/cheap tier is correct, not a compromise: it’s cheaper, snappier, and plenty smart for the job. Save the most-capable tier for the hard 5% where a wrong answer is expensive.
| Task | Tier to reach for | Why |
|---|---|---|
| Autocomplete, simple classification, bulk renames | fast / cheap | high volume, low difficulty — speed + cost win |
| Everyday coding, chat, normal refactors | balanced | the daily driver — smart enough, still quick |
| Hard reasoning, big multi-file refactor, an agent loop | most-capable | correctness matters more than cost or speed |
Why “as of <date>” matters
Model names, context-window sizes, prices, and which tool leads each category change constantly — often within a single quarter. Anything this chapter could print as “the latest model” would be stale before you read it. So the durable skill isn’t memorizing names; it’s knowing the shape (five tool categories, three model tiers) and knowing where to check the current state: the vendor’s own models page, dated to when you look. Treat every specific name below as a snapshot to verify, never a fact to quote.
The habit transfers to everything in this module: tool-and-model facts are time-sensitive. When you write them into a doc, a prompt, or code, stamp them with the date you checked — “as of June 2026, the balanced tier is X” — so a future reader knows exactly how much to trust the claim and when to re-verify.
01 Learning objectives
0 / 5 done02 Curated reading
03 Knowledge check
- 01easy
Autocomplete differs from a coding agent mainly in…
- 02easy
Specific model names and tiers are stable facts worth memorizing.
- 03medium
For a hands-off task that spans many files in a repo (edit, run, test, loop), reach for…
- 04medium
The most capable model is always the right choice for every task.
04 Interview questions
browse all ↗What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.
-
Name the categories of AI coding tools and when you'd reach for each.
Roughly: autocomplete (in-editor suggestions as you type — fast, line-to-block scope), chat assistant (ask questions, get explanations and snippets in a side panel), terminal/CLI agent (runs in your shell, reads/edits files and runs commands across a repo), AI IDE (an editor built around AI with the codebase in context), and app-builder (describe an app, get a scaffolded project).
Reach for autocomplete for flow while writing known code; chat for understanding or a focused snippet; a CLI agent or AI IDE for multi-file changes across a real repo; an app-builder for a quick from-scratch prototype.
Follow-ups they push on- When would autocomplete be the wrong tool?
- What does a CLI agent do that a chat assistant can't?
Red flag Assuming one tool fits every task — a from-scratch prototype and a surgical multi-file refactor want different tools.
source: Anthropic — Claude Code overview ↗ -
Frontier models come in tiers. Describe them without naming specific models.
Most providers offer roughly three tiers: a fast/cheap tier (cheapest and quickest, for high-volume or simple tasks like classification and autocomplete), a balanced tier (the everyday workhorse — good quality at reasonable cost/speed), and a most-capable tier (the strongest reasoning for hard, high-stakes problems, at higher cost and latency).
Deliberately avoid pinning specific names or 'the latest model' — those change constantly. The durable skill is reasoning about the tier, then checking the provider's current model page for which name maps to it today.
Follow-ups they push on- Why frame this in tiers instead of memorizing model names?
- Where would you check which model is current?
Red flag Naming a specific model as 'the best/latest' — it dates instantly. Talk in tiers and verify the current mapping at use time.
source: Anthropic — Models overview ↗ -
Is the most capable model always the right choice? Explain the tradeoff.
No — there's a cost / capability / latency tradeoff. The most-capable tier costs more per token and is slower; for simple, high-volume tasks (tagging, extraction, routing, autocomplete) a fast/cheap model is both cheaper and snappier, and just as correct.
Match the model to the task: escalate to a stronger tier only when the task's reasoning genuinely needs it. A common production pattern is to route — cheap model for the easy 90%, strong model for the hard 10%.
Follow-ups they push on- Give a task where the cheapest tier is the right call.
- What is model 'routing' or a cascade?
Red flag Defaulting to the biggest model for everything — it burns money and latency on tasks a small model nails.
source: Anthropic — Models overview ↗ -
Why do AI-tool and model facts come with an 'as of <date>' caveat, and how do you handle that?
The AI tooling and model landscape moves fast: names, prices, tiers, context-window sizes, and capabilities change month to month. Any specific fact you memorize ('model X is the best', 'it costs $Y') has a short shelf life, and a model's training data has a cutoff so it doesn't even know about newer models.
So you reason in durable concepts (tiers, the cost/capability tradeoff) and verify specifics against the provider's current docs at the moment you need them, rather than trusting a printed name or a number from memory.
Follow-ups they push on- Where do you check the current model lineup?
- Why can't you just trust the model to know the latest model names?
Red flag Treating a model/price fact as permanent — quote tiers and concepts, and re-verify any specific at authoring time.
source: Anthropic — Models overview ↗ -
What can a terminal/CLI coding agent do that an in-editor chat assistant can't?
A CLI/terminal agent runs in your shell with access to your whole project: it can read and edit files across the repo, run commands (tests, builds, git), see the output, and iterate — a full plan-edit-test loop on its own. A chat assistant in the editor mainly sees the snippet or file you've shared and hands back text/snippets you copy in yourself.
The difference is agency over the environment: the CLI agent acts on the real repo (multi-file refactors, running the tests it just changed), while chat is closer to a knowledgeable pair you query for explanations and focused code. That power is also why CLI agents need the review/permission discipline that chat doesn't.
What a strong answer coversCLI agent: reads/edits many files, runs commands, sees output, loops autonomously.
Chat assistant: mostly sees what you paste; returns text you apply yourself.
Difference is agency over the real environment, not just smarter answers.
More power → more need for review and permission gating on the CLI agent.
Quick self-checkYou want a tool to refactor a function across 12 files, run the test suite, and fix what breaks — without you copy-pasting. Which fits best?
-
Wrong — autocomplete suggests as you type; it can't drive a multi-file refactor or run tests.
-
Wrong — chat returns snippets you apply manually; it doesn't act across the repo or run tests itself.
-
Correct — it can edit many files, run the test suite, read output, and iterate autonomously.
-
Wrong — a linter flags issues; it doesn't perform the refactor.
Follow-ups they push on- Why does a CLI agent need stronger review discipline than chat?
- For a one-off 'explain this regex', which tool fits better?
Red flag Expecting a chat assistant to actually apply a multi-file change across your repo — it returns snippets; running the change in the environment is the agent's job.
source: Anthropic — Claude Code overview ↗ -
When is an app-builder ('describe an app, get a project') the right tool, and when is it the wrong one?
An app-builder shines for getting from zero to something visible fast: prototypes, demos, throwaway internal tools, validating an idea, or learning by seeing a working scaffold. You describe what you want and get a runnable project without setup friction.
It's the wrong tool when you need to fit an existing, large codebase, follow specific conventions, or make surgical changes to production code — there a CLI agent or AI IDE working in the real repo is far better. Rule of thumb: app-builders are great at the blank-page start; once there's a real codebase and real constraints, you graduate to tools that operate inside it.
What a strong answer coversBest for: prototypes, demos, throwaway tools, idea validation, fast blank-page starts.
Worst for: surgical edits inside a large existing codebase with conventions.
Once a real repo and constraints exist, switch to a CLI agent or AI IDE.
Strength is zero-to-running speed, not maintaining production code.
Follow-ups they push on- Why is an app-builder awkward for changing an existing production app?
- What do you lose if you keep prototyping in an app-builder past the demo stage?
Red flag Using an app-builder to evolve a serious, growing codebase — it's tuned for fresh scaffolds, not careful changes within established structure and conventions.
source: Anthropic — Claude Code overview ↗ -
How do you pick which model tier to point a coding agent at for a given task?
Match the tier to the task's reasoning demand. For hard, multi-step, high-stakes work — architecture, gnarly debugging, large refactors where a wrong move is costly — use the most-capable tier; the extra cost and latency buy correctness. For routine, well-specified work — boilerplate, simple edits, repetitive transforms, classification-like steps — a fast/cheap tier is snappier and just as correct.
A common pattern is to default to a balanced tier for everyday coding and escalate to the top tier only when a task stalls or genuinely needs deeper reasoning. The skill is reasoning about the demand, not memorizing which model name is 'best' this month — and checking the provider's current model page for which name maps to each tier today.
What a strong answer coversHard/multi-step/high-stakes → most-capable tier; correctness outweighs cost.
Routine, well-specified work → fast/cheap tier; same result, less cost and latency.
Common default: balanced tier everyday, escalate to top tier when stuck.
Reason about reasoning-demand; verify current tier→name mapping in provider docs.
Follow-ups they push on- Give a coding task where the cheapest tier is the right call.
- What signals tell you to escalate from a balanced to the top tier?
Red flag Pointing the biggest, slowest model at every task by default — you burn cost and latency on edits a cheaper tier handles perfectly.
source: Anthropic — Choosing a model ↗ -
What does it mean that a model has a 'training cutoff', and how should that change what you trust it on?
A model's knowledge is frozen at its training data cutoff — it learned from data up to roughly that date and has no inherent awareness of anything after it. So it can be confidently wrong about recent library versions, new APIs, current prices, or even newer models (including itself).
Practically: trust it for durable concepts and patterns (how REST works, what a closure is), but verify anything time-sensitive — latest package version, current API signature, today's model lineup — against live docs or by giving it the current information in context. Tools that can fetch docs or read your actual
package.jsonclose this gap; raw model memory does not.What a strong answer coversKnowledge is frozen at the training cutoff; nothing newer is inherently known.
It can be confidently wrong on recent versions, APIs, prices, and newer models.
Trust it for durable concepts; verify time-sensitive specifics against live sources.
Giving it current docs/context or a fetch tool beats relying on its memory.
Follow-ups they push on- Why might an agent suggest a deprecated API or an old package version?
- How does giving the model your current docs in context fix this?
Red flag Trusting the model's recall of 'the latest' version, API, or model name — that's exactly what its cutoff makes unreliable; check current docs.
source: Anthropic — Models overview ↗ -
AI coding tools feel magical at first but stall on real codebases. What's the realistic mental model for what they're good and bad at?
Think of an AI coding tool as a fast, broadly knowledgeable, eager junior who has never seen your codebase, can't run things in their head reliably, and won't push back unless you make them. They're excellent at well-scoped, well-specified tasks with clear examples and a way to verify; they're weak at ambiguous goals, implicit context they were never given, and anything where being confidently wrong is cheap for them but expensive for you.
The realistic model: their output quality tracks the quality of your context and spec, not the tool's branding. Give relevant files, an example to match, and an acceptance check, and review the result — and they're a force multiplier. Hand them a vague wish and full autonomy, and they generate plausible code that misses the point.
What a strong answer coversStrong on scoped, specified tasks with examples and a verification path.
Weak on ambiguity, unstated context, and self-checking their own correctness.
Output quality tracks your context/spec quality more than the tool's brand.
Force multiplier with good prompts + review; liability with vague goals + blind trust.
Follow-ups they push on- Why does giving an example from the repo improve results so much?
- What's the single highest-leverage thing you can add to a weak prompt?
Red flag Blaming the tool when results are poor — usually the missing piece is context, a concrete example, or an acceptance check the human didn't provide.
source: Anthropic — Claude Code best practices ↗