> cs·fundamentals
interview 0% 28m read
2.2 ★ core [J][A] 14 interview Q's

API design & alternatives

REST principles and when to choose REST vs GraphQL vs gRPC vs WebSockets, plus versioning, pagination, rate limiting, and idempotency keys.

REST is the default for a reason, but “REST vs GraphQL vs gRPC vs WebSockets” is a right-tool decision, not a religion. Pick by the shape of the problem: who the client is, how the data is fetched, and whether the connection is one-shot or live. And whatever you pick, the same unglamorous cross-cutting concerns — versioning, pagination, rate limiting, idempotency, validation, error shapes — are what separate a toy endpoint from a production API.

REST, the constraints that matter

REST is a set of constraints, not a framework. The ones that earn their keep:

  • Resources are nouns, acted on by HTTP verbs. POST /orders creates, GET /orders/42 reads, PUT/PATCH /orders/42 updates, DELETE /orders/42 removes. The verb carries the intent (and its safe/idempotent semantics from 2.1); the URL names the thing.
  • Use status codes honestly201 with a Location on create, 404 for a missing resource, 409 for a conflict. Don’t return 200 with {"error": ...} inside.
  • Statelessness — each request stands alone. This isn’t pedantry: it’s why you can put N identical servers behind a load balancer and have any of them serve any request. The moment a server remembers “this client is mid-checkout,” you’ve pinned that client to that box and broken horizontal scaling.
  • HATEOAS is the constraint nobody fully ships — responses linking to next actions. Know the term and that it exists; you will almost never be asked to implement it.

Choosing the protocol

StyleUse whenAvoid whenGotcha
RESTCRUD over resources, public web APIs, broad client supportyou need flexible aggregated reads or sub-ms internal callsover/under-fetching; chatty endpoints for composite views
GraphQLmany clients (mobile/web) need varied shapes; avoid over/under-fetchingsimple CRUD; you can't afford query-cost governanceN+1 resolvers, caching is harder, malicious deep queries
gRPChigh-throughput internal service-to-service; streamingbrowser-facing public APIs without a proxybinary protobuf is not human-debuggable; needs HTTP/2
WebSocketsreal-time bidirectional (chat, live dashboards, games)plain request/response — adds stateful-connection overheadstateful connections complicate load balancing & scaling
REST is the default; reach for the others when the data-flow shape demands it.

The mental shortcut: REST when a human or a third party will consume it and HTTP caching/tooling is valuable. GraphQL when many heterogeneous clients (an iOS app, a web app, a watch app) each want a different slice of the same graph and you’re tired of building one bespoke endpoint per screen. gRPC for the hot internal path between your own services, where every millisecond and byte counts and both ends ship a generated client from one .proto. WebSockets (or Server-Sent Events for one-way) when the server needs to push to the client without being polled.

The cross-cutting concerns

Whatever style you choose, production APIs need the same scaffolding:

  • Versioning — URI (/v1/...), a header (Accept: application/vnd.api.v2+json), or content negotiation. URI versioning is the bluntest but most legible; pick one and be consistent.
  • Pagination — offset vs cursor (below). Always paginate list endpoints; an unbounded list is a latent outage.
  • Rate limiting — a token bucket (a bucket refills at a steady rate; each request spends a token; empty bucket → reject) or a sliding window counter, returning 429 Too Many Requests with a Retry-After.
  • Request validation — reject malformed input at the edge with 422, never trust the client.
  • A consistent error shape — one envelope ({ "error": { "code", "message", "details" } }) across every endpoint, so clients write one error handler, not fifty.
  • Idempotency keys — so a retried POST doesn’t double-charge.
Idempotency key for a safe POST retry

POST isn’t idempotent, so the client supplies a unique key. The server stores the key→result and replays the original response on any retry:

POST /api/v1/payments HTTP/1.1
Idempotency-Key: 8f14e45f-ea0a-4f3d-9b2c-1a2b3c4d5e6f
Content-Type: application/json

{ "amount": 4200, "currency": "usd", "source": "card_x" }

A correct handler treats the key as a lock plus a cache, in one atomic step:

def handle(req):
    key = req.headers["Idempotency-Key"]

    # Atomically claim the key (a unique insert is the lock).
    if not store.try_insert(key, state="in_progress", ttl=24h):
        existing = store.get(key)
        if existing.state == "done":
            return existing.response          # replay — no second charge
        return 409  # a retry arrived while the first is still running

    result = charge_card(req.body)            # the real, non-idempotent side effect
    store.update(key, state="done", response=result)
    return result

The subtle part is the race: two retries can arrive nearly simultaneously. The try_insert (a conditional/unique-constraint write) ensures exactly one wins the claim; the loser either replays the stored response or, if the first is still in flight, gets a 409 and backs off. This is how Stripe makes a flaky network safe — the same key is processed at most once, no matter how many times the client retries.

Pagination: offset vs cursor

Top: an offset query scanning a long bar of rows and discarding the first 100000. Bottom: a cursor query seeking directly to the last-seen id and reading only the next 20 rows.OFFSET 100000 LIMIT 20read & THROW AWAY 100k rowskeep 20WHERE id < :last_seen LIMIT 20index seek — no scankeep 20
FIG 1 · why deep offsets hurt OFFSET 100000 makes the DB read and discard 100k rows every page; a cursor jumps straight to the next slice via an indexed key.

01 Learning objectives

0 / 3 done

02 Curated reading

03 Knowledge check

knowledge check3 questions · pass ≥ 70%
  1. 01medium

    Mobile clients over-fetch and make many round-trips. Which API style best addresses this?

  2. 02medium

    High-performance internal service-to-service calls with binary payloads and streaming favour:

  3. 03hard

    Which pagination style is most stable when rows are being inserted between page requests?

04 Interview questions

browse all ↗

What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.

  • Commonly asked mid design very common Design a URL-shortening API (like bit.ly). Walk me through the endpoints and the redirect.

    Two core endpoints: POST /urls with the long URL returns a short code (201 + Location); GET /{code} issues a 301/302 redirect to the long URL.

    Key decisions: generate the code via a base62 encoding of an auto-increment id or a hash (handle collisions); store code -> longURL in a fast KV store; cache hot codes (read-heavy workload). Discuss 301 (permanent, cacheable, loses analytics) vs 302 (temporary, every hit reaches you for click counts). Add rate limiting and custom-alias support as extensions.

    Red flag Picking 301 then wondering why click analytics vanish — browsers cache 301 and stop hitting your server.

    source: system-design-primer — Design a URL shortener ↗
  • Commonly asked mid concept very common REST vs GraphQL vs gRPC vs WebSockets — when do you reach for each?

    REST: default for public CRUD over HTTP; cacheable, simple, ubiquitous. GraphQL: client picks exactly the fields it needs — kills over/under-fetching when many clients aggregate data from many resources; cost is caching and query-complexity control. gRPC: high-performance internal service-to-service calls over HTTP/2 + protobuf, with streaming; not browser-native. WebSockets: persistent bidirectional real-time channel (chat, live feeds, multiplayer).

    Choose by traffic shape: public+cacheable → REST; flexible client queries → GraphQL; fast internal RPC → gRPC; push/real-time → WebSockets.

    Red flag Reaching for GraphQL or gRPC by default. For a simple public CRUD API, REST is usually the lower-friction, more cacheable choice.

    source: ByteByteGo — REST vs GraphQL vs gRPC ↗
  • Stripe senior design very common How do idempotency keys make a payment POST safely retryable? Walk through the server logic.

    The client generates a unique key (e.g. a V4 UUID) and sends it in an Idempotency-Key header. The server stores the key with the request's outcome.

    Logic: on first request for a key, process it and persist the resulting status + response body keyed by that idempotency key (inside the same transaction as the side effect). On any retry with the same key, return the stored response instead of re-charging. Handle the in-flight case (a retry arriving while the first is still processing) with a lock or a 409. Stripe expires keys after 24 hours. This turns a non-idempotent POST into a safely retryable one after a timeout.

    Red flag Storing the idempotency record separately from the side effect, so a crash between the charge and the record leaves you able to double-charge. Persist them atomically.

    source: Stripe — Designing robust APIs with idempotency ↗
  • Commonly asked mid concept common Offset pagination vs cursor (keyset) pagination — what breaks with offset at scale?

    Offset/limit (LIMIT 20 OFFSET 10000) is simple but the database must scan and discard every skipped row, so deep pages get slow, and rows shifting between requests cause duplicates or skips.

    Cursor/keyset pagination passes the last-seen sorted key (WHERE id > :lastId ORDER BY id LIMIT 20). It uses the index directly, so performance is constant regardless of depth, and it is stable under inserts. Tradeoff: you can't jump to an arbitrary page number. Use cursors for infinite scroll and large/active datasets.

    Red flag Using OFFSET for an infinite feed — as users scroll, new inserts shift the window and they see duplicates. Cursors avoid that.

    source: Hello Interview — Pagination patterns ↗
  • AmazonStripe senior design very common Design a rate limiter for an API. Which algorithm would you use and why?

    The token bucket is the common default — a bucket refills tokens at a fixed rate up to a capacity; each request consumes a token, and an empty bucket means the request is rejected with 429 Too Many Requests (plus a Retry-After header). It allows short bursts while bounding the average rate. ByteByteGo notes both Amazon and Stripe use this algorithm to throttle their APIs.

    Alternatives: leaky bucket (smooths to a constant outflow), fixed window (simple but allows 2x bursts at window edges), and sliding window (smooths the edge problem). For a distributed limiter, keep counters in a shared store like Redis (atomic INCR with TTL) so all nodes agree.

    Red flag Keeping the counter in each server's local memory in a multi-node deployment — clients then get N times the limit. Use a shared/atomic store.

    source: ByteByteGo — Design a rate limiter ↗
  • Commonly asked mid concept common How do you version a public API, and how do you evolve it without breaking clients?

    Three common strategies: URI versioning (/v1/users) — explicit and cache-friendly, the most common; header versioning (Accept: application/vnd.api.v2+json) — cleaner URLs, harder to test in a browser; and query param (?version=2).

    The deeper answer is to avoid breaking changes at all: add fields rather than remove, treat unknown fields as ignorable, never repurpose a field's meaning, and only bump the major version for genuinely incompatible changes. Announce deprecations with timelines and Deprecation/Sunset headers.

    Red flag Bumping the version for additive changes. Adding an optional field is backward-compatible and shouldn't force clients to migrate.

    source: Hello Interview — API design (versioning) ↗
  • Commonly asked junior concept common What does a good API error response look like, and why is a consistent error shape worth enforcing?

    Use the right status code to signal the category, then a structured body with a stable machine-readable code, a human message, and optional details/field-level errors. Keep the shape identical across every endpoint so clients can handle errors generically.

    Example shape: { "error": { "code": "card_declined", "message": "Your card was declined.", "details": [] } }. Stable string codes (not just HTTP numbers) let clients branch on the specific failure without parsing prose. Never leak stack traces or internal identifiers.

    Red flag Returning 200 with `{ success: false }`, or varying the error body per endpoint. Clients then can't handle failures uniformly.

    source: Stripe — Error handling ↗
  • Commonly asked senior trick occasional What is HATEOAS, and is it actually used in practice?

    HATEOAS (Hypermedia As The Engine Of Application State) is the REST constraint where responses include links to the next available actions, so the client discovers transitions dynamically ({ "_links": { "cancel": "/orders/42/cancel" } }) instead of hardcoding URLs.

    In practice it's the least-adopted REST constraint — most 'REST' APIs are really HTTP+JSON without hypermedia. Be honest in interviews: know what it is and the decoupling argument, but acknowledge most teams skip it because clients are coupled to the API anyway and tooling support is thin.

    Red flag Claiming your API is 'fully RESTful' while having no hypermedia — by Fielding's definition that's level 2, not true REST.

    source: MDN — REST ↗
  • ★ must-know Commonly asked senior concept common Why does the N+1 query problem hit GraphQL especially hard, and how do you fix it?

    GraphQL resolvers run per-field, per-object. Fetch a list of 10 authors and then ask for each author's posts, and the naive resolver fires 1 query for the authors + N queries for the posts — the classic N+1 blowup, which gets worse as clients nest deeper.

    The standard fix is a DataLoader: it batches the individual post requests made within one tick of the event loop into a single WHERE author_id IN (...) query and caches results per request. This collapses N+1 into 2 queries while keeping the per-field resolver model.

    What a strong answer covers
    • Per-field resolvers mean nested fields each trigger their own query.

    • A list of N parents requesting a child field → 1 + N queries.

    • DataLoader batches per-tick requests into one IN (...) query and caches per request.

    • It's worse in GraphQL than REST because clients control nesting depth dynamically.

    Quick self-check

    Querying 50 users and each user's `team` name with a naive resolver issues how many DB queries, and what fixes it?

    Red flag Solving N+1 by eager-loading everything regardless of the query — you over-fetch and lose GraphQL's selectivity. Batch with DataLoader instead.

    source: Apollo — Optimizing resolvers with DataLoader ↗
  • Commonly asked mid concept common WebSockets vs Server-Sent Events vs long polling — how do you pick for a real-time feature?

    Long polling holds an HTTP request open until there's data, then the client reconnects — works everywhere but is request-heavy and laggy. Server-Sent Events (SSE) is a one-way server→client stream over a single long-lived HTTP connection, with built-in auto-reconnect and event IDs — ideal for notifications, live scores, dashboards. WebSockets give a full-duplex bidirectional channel after an HTTP upgrade — needed when the client also pushes frequently (chat, collaborative editing, multiplayer).

    Rule of thumb: server-push-only → SSE (simpler, rides plain HTTP); two-way/high-frequency → WebSockets; fallback when neither is available → long polling.

    What a strong answer covers
    • SSE is unidirectional (server→client), text-only, with automatic reconnection.

    • WebSockets are bidirectional full-duplex after an upgrade handshake.

    • Long polling is the universal but least efficient fallback.

    • SSE works over plain HTTP/2; WebSockets need their own protocol handling.

    Quick self-check

    A dashboard only needs the server to push live metric updates to the browser. Best fit?

    Red flag Reaching for WebSockets for a one-way notification stream. SSE is simpler, auto-reconnects, and rides ordinary HTTP infrastructure.

    source: MDN — Server-sent events ↗
  • Commonly asked mid concept occasional What makes gRPC fast, and what are the practical downsides versus REST/JSON?

    gRPC rides HTTP/2 (multiplexed, persistent connections) and serializes with Protocol Buffers — a compact binary format with a strict schema, so payloads are smaller and parsing is faster than text JSON. It also generates typed client/server stubs and supports streaming in both directions.

    Downsides: it's not natively callable from browsers (you need gRPC-Web + a proxy); the binary payloads aren't human-readable, so debugging needs tooling; and it adds schema/codegen overhead. That's why gRPC dominates internal service-to-service traffic while REST/JSON stays the default for public, browser-facing APIs.

    What a strong answer covers
    • HTTP/2 transport + binary Protocol Buffers → small payloads, fast parsing.

    • Generated typed stubs and first-class bidirectional streaming.

    • Not browser-native — needs gRPC-Web and a proxy.

    • Binary payloads are hard to eyeball/debug versus JSON.

    Red flag Choosing gRPC for a public browser-facing API. Its lack of native browser support and opaque payloads make REST/JSON the friendlier public choice.

    source: gRPC — Core concepts, architecture and lifecycle ↗
  • Commonly asked senior design occasional Design a bulk-create endpoint that imports 10,000 records. Sync or async, and how do you report results?

    Don't process 10k records in a synchronous request — you'll hit timeouts and tie up a worker. Accept the payload, validate it cheaply, enqueue a background job, and return 202 Accepted with a job/status URL (Location: /imports/{id}). The client polls that URL (or subscribes) for progress and the final per-record outcome.

    Key decisions: define partial-failure semantics (all-or-nothing transaction vs per-record results so 9,998 succeed and 2 errors are reported), make the import idempotent via a client-supplied batch key so retries don't double-import, and cap batch size with backpressure.

    Red flag Processing the whole batch inline and returning one 200/500. Long requests time out, and a single bad row failing the entire batch is a poor contract — go async with per-record results.

    source: MDN — 202 Accepted ↗
  • Commonly asked junior trick common Trick: what's wrong with the REST route GET /getUserById?id=5, and how should it look?

    It mixes RPC-style verb-in-the-path (getUserById) with REST, which is redundant and inconsistent. In REST the HTTP method is the verb and the URL names a resource (noun). So fetching user 5 is simply GET /users/5; the GET already says 'retrieve', and /users/{id} identifies the resource.

    Proper resource modeling: GET /users (list), POST /users (create), GET /users/5, PUT/PATCH /users/5 (update), DELETE /users/5. Keep verbs out of paths and use plural nouns consistently.

    What a strong answer covers
    • The HTTP method is the verb; the path is a noun/resource identifier.

    • GET /users/5, not GET /getUserById?id=5.

    • Use plural collection nouns consistently (/users, /orders).

    • Verb-in-path is an RPC style, not REST.

    Quick self-check

    Which is the correct RESTful way to fetch the user with id 5?

    Red flag Putting actions/verbs in the URL (`/createUser`, `/deleteOrder`). The method conveys the action; the path names the thing.

    source: MDN — REST ↗
  • GitHub mid concept occasional How do rate-limit response headers (X-RateLimit-* / RateLimit-*) and 429 + Retry-After help a well-behaved client?

    When throttling, return 429 Too Many Requests and tell the client *how* to behave. Limit headers expose the budget: a limit, the remaining count, and a reset time (GitHub uses X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset; the IETF RateLimit draft standardizes this). On a 429 (or 503) a Retry-After header tells the client exactly how long to wait.

    This lets a good client self-throttle proactively — slow down as remaining approaches zero and back off precisely after a 429 — instead of blindly hammering and guessing.

    What a strong answer covers
    • 429 = rate limited; pair it with Retry-After (seconds or a date).

    • Limit/Remaining/Reset headers let clients pace themselves before being blocked.

    • GitHub's API documents X-RateLimit-*; an IETF RateLimit header draft standardizes the pattern.

    • Proactive self-throttling beats reactive retry-storms.

    Red flag Returning 429 with no Retry-After or budget headers, leaving clients to guess and retry-storm. Tell them how long to wait and how much budget remains.

    source: GitHub REST API — Rate limits ↗