2.2 ★ core [J][A] 14 interview Q's

API design & alternatives

REST principles and when to choose REST vs GraphQL vs gRPC vs WebSockets, plus versioning, pagination, rate limiting, and idempotency keys.

REST is the default for a reason, but “REST vs GraphQL vs gRPC vs WebSockets” is a right-tool decision, not a religion. Pick by the shape of the problem: who the client is, how the data is fetched, and whether the connection is one-shot or live. And whatever you pick, the same unglamorous cross-cutting concerns — versioning, pagination, rate limiting, idempotency, validation, error shapes — are what separate a toy endpoint from a production API.

Key vocabulary

Resource vs RPC: REST models nouns (/orders/42) acted on by HTTP verbs; RPC (gRPC) models verbs (CreateOrder) as function calls. REST leans on HTTP semantics; RPC leans on a contract.
Statelessness: Every request carries everything the server needs; the server keeps no per-client session between calls. This is what lets you load-balance across identical instances.
HATEOAS: “Hypermedia as the engine of application state” — responses embed links to the next valid actions, so a client discovers the API instead of hardcoding URLs. The most-cited, least-implemented REST constraint; be aware of it.
Over- / under-fetching: REST endpoints return fixed shapes, so a client often gets too much (over) or must make many calls to assemble a view (under). GraphQL exists to let the client ask for exactly what it needs.
Offset vs cursor pagination: OFFSET skips N rows — simple but drifts and slows on deep pages. Cursor/keyset paginates from the last-seen key — stable and fast, but no random page jumps.

REST, the constraints that matter

REST is a set of constraints, not a framework. The ones that earn their keep:

Resources are nouns, acted on by HTTP verbs. POST /orders creates, GET /orders/42 reads, PUT/PATCH /orders/42 updates, DELETE /orders/42 removes. The verb carries the intent (and its safe/idempotent semantics from 2.1); the URL names the thing.
Use status codes honestly — 201 with a Location on create, 404 for a missing resource, 409 for a conflict. Don’t return 200 with {"error": ...} inside.
Statelessness — each request stands alone. This isn’t pedantry: it’s why you can put N identical servers behind a load balancer and have any of them serve any request. The moment a server remembers “this client is mid-checkout,” you’ve pinned that client to that box and broken horizontal scaling.
HATEOAS is the constraint nobody fully ships — responses linking to next actions. Know the term and that it exists; you will almost never be asked to implement it.

Choosing the protocol

Style	Use when	Avoid when	Gotcha
REST	CRUD over resources, public web APIs, broad client support	you need flexible aggregated reads or sub-ms internal calls	over/under-fetching; chatty endpoints for composite views
GraphQL	many clients (mobile/web) need varied shapes; avoid over/under-fetching	simple CRUD; you can't afford query-cost governance	N+1 resolvers, caching is harder, malicious deep queries
gRPC	high-throughput internal service-to-service; streaming	browser-facing public APIs without a proxy	binary protobuf is not human-debuggable; needs HTTP/2
WebSockets	real-time bidirectional (chat, live dashboards, games)	plain request/response — adds stateful-connection overhead	stateful connections complicate load balancing & scaling

REST is the default; reach for the others when the data-flow shape demands it.

The mental shortcut: REST when a human or a third party will consume it and HTTP caching/tooling is valuable. GraphQL when many heterogeneous clients (an iOS app, a web app, a watch app) each want a different slice of the same graph and you’re tired of building one bespoke endpoint per screen. gRPC for the hot internal path between your own services, where every millisecond and byte counts and both ends ship a generated client from one .proto. WebSockets (or Server-Sent Events for one-way) when the server needs to push to the client without being polled.

The cross-cutting concerns

Whatever style you choose, production APIs need the same scaffolding:

Versioning — URI (/v1/...), a header (Accept: application/vnd.api.v2+json), or content negotiation. URI versioning is the bluntest but most legible; pick one and be consistent.
Pagination — offset vs cursor (below). Always paginate list endpoints; an unbounded list is a latent outage.
Rate limiting — a token bucket (a bucket refills at a steady rate; each request spends a token; empty bucket → reject) or a sliding window counter, returning 429 Too Many Requests with a Retry-After.
Request validation — reject malformed input at the edge with 422, never trust the client.
A consistent error shape — one envelope ({ "error": { "code", "message", "details" } }) across every endpoint, so clients write one error handler, not fifty.
Idempotency keys — so a retried POST doesn’t double-charge.

Idempotency key for a safe POST retry

POST isn’t idempotent, so the client supplies a unique key. The server stores the key→result and replays the original response on any retry:

POST /api/v1/payments HTTP/1.1
Idempotency-Key: 8f14e45f-ea0a-4f3d-9b2c-1a2b3c4d5e6f
Content-Type: application/json

{ "amount": 4200, "currency": "usd", "source": "card_x" }

A correct handler treats the key as a lock plus a cache, in one atomic step:

def handle(req):
    key = req.headers["Idempotency-Key"]

    # Atomically claim the key (a unique insert is the lock).
    if not store.try_insert(key, state="in_progress", ttl=24h):
        existing = store.get(key)
        if existing.state == "done":
            return existing.response          # replay — no second charge
        return 409  # a retry arrived while the first is still running

    result = charge_card(req.body)            # the real, non-idempotent side effect
    store.update(key, state="done", response=result)
    return result

The subtle part is the race: two retries can arrive nearly simultaneously. The try_insert (a conditional/unique-constraint write) ensures exactly one wins the claim; the loser either replays the stored response or, if the first is still in flight, gets a 409 and backs off. This is how Stripe makes a flaky network safe — the same key is processed at most once, no matter how many times the client retries.

Pagination: offset vs cursor

FIG 1 · why deep offsets hurt OFFSET 100000 makes the DB read and discard 100k rows every page; a cursor jumps straight to the next slice via an indexed key.

From the trenches Stripe

Stripe’s API treats idempotency as a first-class header, not an afterthought. A client sends Idempotency-Key on any POST; Stripe stores the result of the first request against that key and returns the exact same response — same status, same body — for every retry within a 24-hour window. Crucially, this includes errors: if your first attempt failed with a card decline, the retry replays the decline rather than re-running the charge.

The design lesson is that the network will drop responses you successfully processed. Your server charged the card, then the connection died before the 200 got home; the client can’t tell “never happened” from “happened but I didn’t hear back.” The idempotency key collapses that ambiguity: retry freely, and the worst case is you get the original answer again. Every serious payments, ledger, or order-creation API ends up reinventing this pattern.

read the writeup ↗ docs.stripe.com

01 Learning objectives

0 / 3 done

02 Curated reading

Stripe API design — Idempotent requests
essential eng blog 15m — The reference implementation of idempotency keys for safe POST retries.
ByteByteGo — REST vs GraphQL vs gRPC
optional eng blog 12m — Visual decision framing for API styles.

03 Knowledge check

knowledge check3 questions · pass ≥ 70%

01medium
Mobile clients over-fetch and make many round-trips. Which API style best addresses this?
02medium
High-performance internal service-to-service calls with binary payloads and streaming favour:
03hard
Which pagination style is most stable when rows are being inserted between page requests?

04 Interview questions

browse all ↗

What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.

Commonly asked mid design very common Design a URL-shortening API (like bit.ly). Walk me through the endpoints and the redirect.
Two core endpoints: POST /urls with the long URL returns a short code (201 + Location); GET /{code} issues a 301/302 redirect to the long URL.
Key decisions: generate the code via a base62 encoding of an auto-increment id or a hash (handle collisions); store code -> longURL in a fast KV store; cache hot codes (read-heavy workload). Discuss 301 (permanent, cacheable, loses analytics) vs 302 (temporary, every hit reaches you for click counts). Add rate limiting and custom-alias support as extensions.
Follow-ups they push on
- 301 vs 302 for the redirect — which and why?
- How do you guarantee short-code uniqueness at scale?
- How would you add click analytics without slowing the redirect?
Red flag Picking 301 then wondering why click analytics vanish — browsers cache 301 and stop hitting your server.
source: system-design-primer — Design a URL shortener ↗
Commonly asked mid concept very common REST vs GraphQL vs gRPC vs WebSockets — when do you reach for each?
REST: default for public CRUD over HTTP; cacheable, simple, ubiquitous. GraphQL: client picks exactly the fields it needs — kills over/under-fetching when many clients aggregate data from many resources; cost is caching and query-complexity control. gRPC: high-performance internal service-to-service calls over HTTP/2 + protobuf, with streaming; not browser-native. WebSockets: persistent bidirectional real-time channel (chat, live feeds, multiplayer).
Choose by traffic shape: public+cacheable → REST; flexible client queries → GraphQL; fast internal RPC → gRPC; push/real-time → WebSockets.
Follow-ups they push on
- Why is GraphQL harder to cache than REST?
- Why isn't gRPC used directly from browsers?
Red flag Reaching for GraphQL or gRPC by default. For a simple public CRUD API, REST is usually the lower-friction, more cacheable choice.
source: ByteByteGo — REST vs GraphQL vs gRPC ↗
Stripe senior design very common How do idempotency keys make a payment POST safely retryable? Walk through the server logic.
The client generates a unique key (e.g. a V4 UUID) and sends it in an Idempotency-Key header. The server stores the key with the request's outcome.
Logic: on first request for a key, process it and persist the resulting status + response body keyed by that idempotency key (inside the same transaction as the side effect). On any retry with the same key, return the stored response instead of re-charging. Handle the in-flight case (a retry arriving while the first is still processing) with a lock or a 409. Stripe expires keys after 24 hours. This turns a non-idempotent POST into a safely retryable one after a timeout.
Follow-ups they push on
- Where do you store the key — same DB transaction as the charge? Why?
- What if two identical requests arrive concurrently?
Red flag Storing the idempotency record separately from the side effect, so a crash between the charge and the record leaves you able to double-charge. Persist them atomically.
source: Stripe — Designing robust APIs with idempotency ↗
Commonly asked mid concept common Offset pagination vs cursor (keyset) pagination — what breaks with offset at scale?
Offset/limit (LIMIT 20 OFFSET 10000) is simple but the database must scan and discard every skipped row, so deep pages get slow, and rows shifting between requests cause duplicates or skips.
Cursor/keyset pagination passes the last-seen sorted key (WHERE id > :lastId ORDER BY id LIMIT 20). It uses the index directly, so performance is constant regardless of depth, and it is stable under inserts. Tradeoff: you can't jump to an arbitrary page number. Use cursors for infinite scroll and large/active datasets.
Follow-ups they push on
- Why does offset pagination skip or duplicate rows under writes?
- How do you build a cursor over a non-unique sort column?
Red flag Using OFFSET for an infinite feed — as users scroll, new inserts shift the window and they see duplicates. Cursors avoid that.
source: Hello Interview — Pagination patterns ↗
AmazonStripe senior design very common Design a rate limiter for an API. Which algorithm would you use and why?
The token bucket is the common default — a bucket refills tokens at a fixed rate up to a capacity; each request consumes a token, and an empty bucket means the request is rejected with 429 Too Many Requests (plus a Retry-After header). It allows short bursts while bounding the average rate. ByteByteGo notes both Amazon and Stripe use this algorithm to throttle their APIs.
Alternatives: leaky bucket (smooths to a constant outflow), fixed window (simple but allows 2x bursts at window edges), and sliding window (smooths the edge problem). For a distributed limiter, keep counters in a shared store like Redis (atomic INCR with TTL) so all nodes agree.
Follow-ups they push on
- What status code and header do you return when throttled?
- How do you keep the limit consistent across many API servers?
- Why does fixed-window allow a 2x burst?
Red flag Keeping the counter in each server's local memory in a multi-node deployment — clients then get N times the limit. Use a shared/atomic store.
source: ByteByteGo — Design a rate limiter ↗
Commonly asked mid concept common How do you version a public API, and how do you evolve it without breaking clients?
Three common strategies: URI versioning (/v1/users) — explicit and cache-friendly, the most common; header versioning (Accept: application/vnd.api.v2+json) — cleaner URLs, harder to test in a browser; and query param (?version=2).
The deeper answer is to avoid breaking changes at all: add fields rather than remove, treat unknown fields as ignorable, never repurpose a field's meaning, and only bump the major version for genuinely incompatible changes. Announce deprecations with timelines and Deprecation/Sunset headers.
Follow-ups they push on
- What counts as a breaking vs non-breaking change?
- How does Stripe version without URL bumps? (dated versions pinned per account)
Red flag Bumping the version for additive changes. Adding an optional field is backward-compatible and shouldn't force clients to migrate.
source: Hello Interview — API design (versioning) ↗
Commonly asked junior concept common What does a good API error response look like, and why is a consistent error shape worth enforcing?
Use the right status code to signal the category, then a structured body with a stable machine-readable code, a human message, and optional details/field-level errors. Keep the shape identical across every endpoint so clients can handle errors generically.
Example shape: { "error": { "code": "card_declined", "message": "Your card was declined.", "details": [] } }. Stable string codes (not just HTTP numbers) let clients branch on the specific failure without parsing prose. Never leak stack traces or internal identifiers.
Follow-ups they push on
- 400 vs 422 for validation errors?
- Why include a stable `code` string alongside the HTTP status?
Red flag Returning 200 with `{ success: false }`, or varying the error body per endpoint. Clients then can't handle failures uniformly.
source: Stripe — Error handling ↗
Commonly asked senior trick occasional What is HATEOAS, and is it actually used in practice?
HATEOAS (Hypermedia As The Engine Of Application State) is the REST constraint where responses include links to the next available actions, so the client discovers transitions dynamically ({ "_links": { "cancel": "/orders/42/cancel" } }) instead of hardcoding URLs.
In practice it's the least-adopted REST constraint — most 'REST' APIs are really HTTP+JSON without hypermedia. Be honest in interviews: know what it is and the decoupling argument, but acknowledge most teams skip it because clients are coupled to the API anyway and tooling support is thin.
Follow-ups they push on
- What would full HATEOAS buy you that plain JSON doesn't?
- What is the Richardson Maturity Model?
Red flag Claiming your API is 'fully RESTful' while having no hypermedia — by Fielding's definition that's level 2, not true REST.
source: MDN — REST ↗
★ must-know Commonly asked senior concept common Why does the N+1 query problem hit GraphQL especially hard, and how do you fix it?
GraphQL resolvers run per-field, per-object. Fetch a list of 10 authors and then ask for each author's posts, and the naive resolver fires 1 query for the authors + N queries for the posts — the classic N+1 blowup, which gets worse as clients nest deeper.
The standard fix is a DataLoader: it batches the individual post requests made within one tick of the event loop into a single WHERE author_id IN (...) query and caches results per request. This collapses N+1 into 2 queries while keeping the per-field resolver model.
What a strong answer covers
- Per-field resolvers mean nested fields each trigger their own query.
- A list of N parents requesting a child field → 1 + N queries.
- DataLoader batches per-tick requests into one IN (...) query and caches per request.
- It's worse in GraphQL than REST because clients control nesting depth dynamically.
Quick self-check
Querying 50 users and each user's `team` name with a naive resolver issues how many DB queries, and what fixes it?
Follow-ups they push on
- Why is per-request caching (not global) the right scope for DataLoader?
- How does query-depth/complexity limiting relate to this?
Red flag Solving N+1 by eager-loading everything regardless of the query — you over-fetch and lose GraphQL's selectivity. Batch with DataLoader instead.
source: Apollo — Optimizing resolvers with DataLoader ↗
Commonly asked mid concept common WebSockets vs Server-Sent Events vs long polling — how do you pick for a real-time feature?
Long polling holds an HTTP request open until there's data, then the client reconnects — works everywhere but is request-heavy and laggy. Server-Sent Events (SSE) is a one-way server→client stream over a single long-lived HTTP connection, with built-in auto-reconnect and event IDs — ideal for notifications, live scores, dashboards. WebSockets give a full-duplex bidirectional channel after an HTTP upgrade — needed when the client also pushes frequently (chat, collaborative editing, multiplayer).
Rule of thumb: server-push-only → SSE (simpler, rides plain HTTP); two-way/high-frequency → WebSockets; fallback when neither is available → long polling.
What a strong answer covers
- SSE is unidirectional (server→client), text-only, with automatic reconnection.
- WebSockets are bidirectional full-duplex after an upgrade handshake.
- Long polling is the universal but least efficient fallback.
- SSE works over plain HTTP/2; WebSockets need their own protocol handling.
Quick self-check
A dashboard only needs the server to push live metric updates to the browser. Best fit?
Follow-ups they push on
- Why might SSE be a better fit than WebSockets for a notifications feed?
- What HTTP mechanism upgrades a connection to a WebSocket? (101 Switching Protocols)
Red flag Reaching for WebSockets for a one-way notification stream. SSE is simpler, auto-reconnects, and rides ordinary HTTP infrastructure.
source: MDN — Server-sent events ↗
Commonly asked mid concept occasional What makes gRPC fast, and what are the practical downsides versus REST/JSON?
gRPC rides HTTP/2 (multiplexed, persistent connections) and serializes with Protocol Buffers — a compact binary format with a strict schema, so payloads are smaller and parsing is faster than text JSON. It also generates typed client/server stubs and supports streaming in both directions.
Downsides: it's not natively callable from browsers (you need gRPC-Web + a proxy); the binary payloads aren't human-readable, so debugging needs tooling; and it adds schema/codegen overhead. That's why gRPC dominates internal service-to-service traffic while REST/JSON stays the default for public, browser-facing APIs.
What a strong answer covers
- HTTP/2 transport + binary Protocol Buffers → small payloads, fast parsing.
- Generated typed stubs and first-class bidirectional streaming.
- Not browser-native — needs gRPC-Web and a proxy.
- Binary payloads are hard to eyeball/debug versus JSON.
Follow-ups they push on
- Why can't a browser call a gRPC service directly?
- When is the protobuf schema requirement a benefit vs a burden?
Red flag Choosing gRPC for a public browser-facing API. Its lack of native browser support and opaque payloads make REST/JSON the friendlier public choice.
source: gRPC — Core concepts, architecture and lifecycle ↗
Commonly asked senior design occasional Design a bulk-create endpoint that imports 10,000 records. Sync or async, and how do you report results?
Don't process 10k records in a synchronous request — you'll hit timeouts and tie up a worker. Accept the payload, validate it cheaply, enqueue a background job, and return 202 Accepted with a job/status URL (Location: /imports/{id}). The client polls that URL (or subscribes) for progress and the final per-record outcome.
Key decisions: define partial-failure semantics (all-or-nothing transaction vs per-record results so 9,998 succeed and 2 errors are reported), make the import idempotent via a client-supplied batch key so retries don't double-import, and cap batch size with backpressure.
Follow-ups they push on
- All-or-nothing vs per-record partial success — which and why?
- How do you make the bulk import idempotent under client retries?
- What status code signals 'accepted but not yet done'? (202)
Red flag Processing the whole batch inline and returning one 200/500. Long requests time out, and a single bad row failing the entire batch is a poor contract — go async with per-record results.
source: MDN — 202 Accepted ↗
Commonly asked junior trick common Trick: what's wrong with the REST route GET /getUserById?id=5, and how should it look?
It mixes RPC-style verb-in-the-path (getUserById) with REST, which is redundant and inconsistent. In REST the HTTP method is the verb and the URL names a resource (noun). So fetching user 5 is simply GET /users/5; the GET already says 'retrieve', and /users/{id} identifies the resource.
Proper resource modeling: GET /users (list), POST /users (create), GET /users/5, PUT/PATCH /users/5 (update), DELETE /users/5. Keep verbs out of paths and use plural nouns consistently.
What a strong answer covers
- The HTTP method is the verb; the path is a noun/resource identifier.
- GET /users/5, not GET /getUserById?id=5.
- Use plural collection nouns consistently (/users, /orders).
- Verb-in-path is an RPC style, not REST.
Quick self-check
Which is the correct RESTful way to fetch the user with id 5?
Follow-ups they push on
- How would you model 'cancel an order' RESTfully? (POST /orders/5/cancel or PATCH status)
- When is an RPC-style action endpoint actually acceptable?
Red flag Putting actions/verbs in the URL (`/createUser`, `/deleteOrder`). The method conveys the action; the path names the thing.
source: MDN — REST ↗
GitHub mid concept occasional How do rate-limit response headers (X-RateLimit-* / RateLimit-*) and 429 + Retry-After help a well-behaved client?
When throttling, return 429 Too Many Requests and tell the client *how* to behave. Limit headers expose the budget: a limit, the remaining count, and a reset time (GitHub uses X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset; the IETF RateLimit draft standardizes this). On a 429 (or 503) a Retry-After header tells the client exactly how long to wait.
This lets a good client self-throttle proactively — slow down as remaining approaches zero and back off precisely after a 429 — instead of blindly hammering and guessing.
What a strong answer covers
- 429 = rate limited; pair it with Retry-After (seconds or a date).
- Limit/Remaining/Reset headers let clients pace themselves before being blocked.
- GitHub's API documents X-RateLimit-*; an IETF RateLimit header draft standardizes the pattern.
- Proactive self-throttling beats reactive retry-storms.
Follow-ups they push on
- What format can Retry-After take? (delay-seconds or an HTTP date)
- Why surface Remaining/Reset instead of only a 429?
Red flag Returning 429 with no Retry-After or budget headers, leaving clients to guess and retry-storm. Tell them how long to wait and how much budget remains.
source: GitHub REST API — Rate limits ↗