NoSQL
The four NoSQL families and when each fits, BASE vs ACID, and sharding vs replication vs partitioning.
“NoSQL” isn’t one thing — it’s four families of databases that each drop some relational guarantees to win something else (scale, flexibility, or a data shape SQL handles badly). This chapter is about matching the family to the workload, and understanding the consistency and scaling trade-offs underneath them all.
The four families
The mistake is treating NoSQL as a monolith. Each family is shaped around a different access pattern — pick by how you read and write, not by hype.
| Family | Data shape | Reach for it when | Examples |
|---|---|---|---|
| Document | JSON-like documents, flexible schema | content, catalogs, user profiles — nested data read/written whole | MongoDB, Couchbase |
| Key-value | opaque value behind a key | caching, sessions, leaderboards — fastest by-key access | Redis, DynamoDB |
| Wide-column | partitioned rows, flexible columns | time-series, IoT, write-heavy at massive scale | Cassandra, HBase |
| Graph | nodes + edges | social graphs, fraud rings, recommendations — relationship traversal | Neo4j, Neptune |
A few sharper framings:
- Document shines when your entity is naturally a nested blob you load and save as a unit (an order with its line items embedded), and the schema evolves often. You trade ad-hoc multi-table joins for read locality. The flip side: data you embed in two documents is duplicated, so a fact that changes must be updated in every copy — the denormalization trade from chapter 3.5, now baked into the model.
- Key-value is the simplest model and the fastest:
O(1)by key, nothing else. It’s the workhorse behind caches (chapter 2.8), session stores, and leaderboards (Redis sorted sets). You cannot query by value — if you need “all users in Brazil,” a pure key-value store can’t help; you’d maintain that index yourself. - Wide-column is built for write throughput across a cluster. Cassandra has no joins and you design the table per query — the opposite of normalization. You duplicate data into one table per access pattern, because a write is cheap and a scan across partitions is the thing you must never do. Great for time-series/IoT firehoses.
- Graph wins when the relationships are the point. “Friends of friends who bought X” is a cheap traversal in Neo4j (hop edge to edge) and an expensive pile of self-joins in SQL that gets worse with each degree of separation.
BASE vs ACID
Relational databases give you ACID (chapter 3.5): strong, immediate consistency. Many distributed NoSQL stores instead offer BASE:
- Basically Available — the system stays up and answers, even during partitions (it favors availability).
- Soft state — state may change over time without new input, as replicas converge.
- Eventually consistent — given no new writes, all replicas eventually agree; but a read right after a write might return stale data.
This is the CAP theorem (chapter 2.7) playing out: under a network partition you must choose consistency or availability, and BASE systems lean toward availability. The trade is concrete — a value you just wrote on one node may not appear on another for a moment.
| ACID (relational) | BASE (many NoSQL) | |
|---|---|---|
| Consistency | strong, immediate | eventual |
| Availability under partition | may reject writes | stays available |
| Best for | money, inventory, anything that must be exactly right | feeds, analytics, caches, high-write at scale |
| Mental model | correctness first | availability + scale first |
Scaling: sharding vs replication vs partitioning
These three terms get used interchangeably and shouldn’t be — they solve different problems:
| Technique | What it does | Scales | Cost |
|---|---|---|---|
| Replication | copies of the data on multiple servers (leader-follower) | reads + availability (HA) | doesn't scale writes; replica lag |
| Sharding | splits data across servers by a shard key | writes + storage | cross-shard queries/joins get hard |
| Partitioning | splits one dataset into parts (rows = horizontal, columns = vertical) | manageability + query pruning | choosing the partition key |
- Replication keeps full copies of the data on multiple nodes, usually leader-follower: writes go to the leader and stream to followers. It buys you high availability (a follower is promoted if the leader dies) and read scaling (route reads to read replicas). It does not scale writes — every write still hits the single leader — and followers may lag.
- Sharding splits the data across nodes by a shard key (e.g. customers A–M on shard 1, N–Z on shard 2). Each node owns a slice, so writes and storage scale horizontally. The cost: queries spanning shards (and joins across them) become expensive or impossible, and a bad shard key creates hotspots — pick a key that spreads load evenly, never a monotonically increasing one like a timestamp (all today’s writes land on one shard).
- Partitioning is splitting a single dataset into parts. Horizontal partitioning splits by rows (often the same idea as sharding when the parts live on different servers); vertical partitioning splits by columns (rarely-used columns to a separate table). It improves manageability and lets the engine prune irrelevant partitions.
01 Learning objectives
0 / 6 done02 Curated reading
03 Knowledge check
- 01medium
Caching, sessions, and leaderboards are the sweet spot of which NoSQL family?
- 02medium
BASE (vs ACID) emphasises:
- 03hard
Sharding primarily scales ___, while replication primarily scales ___.
04 Interview questions
browse all ↗What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.
-
Name the four main NoSQL families and a use case where each beats a relational DB.
Document (MongoDB) — flexible JSON-like docs; content, catalogs, user profiles where the shape varies.
Key-value (Redis, DynamoDB) — fastest by-key access; caching, sessions, leaderboards, rate counters.
Wide-column (Cassandra, HBase) — massive distributed write scale; time-series, IoT, event logs.
Graph (Neo4j) — relationship-heavy traversals; social graphs, fraud rings, recommendations.
The through-line: each optimizes a specific access pattern that relational tables + joins serve poorly at scale.
Follow-ups they push on- Why is a graph DB better than SQL recursive joins for 'friends of friends of friends'?
- Document vs wide-column — how do their data models differ?
Red flag Treating 'NoSQL' as one thing, or claiming it's 'schemaless so always better' — each family has a narrow sweet spot.
source: MongoDB — Types of NoSQL Databases ↗ -
In MongoDB, when do you embed a sub-document vs reference another collection?
Embed when the child is owned by and always read with the parent, the relationship is one-to-few, and the embedded data doesn't grow unbounded — e.g. a user's addresses inside the user document. One read fetches everything; no join.
Reference (store an ObjectId, join with
$lookupor a second query) when the child is large, shared across parents (many-to-many), updated independently, or the array would grow without bound (a celebrity's millions of followers). This avoids the 16MB document cap and write amplification.Rule: model around your access patterns, not entities — 'data that is accessed together should be stored together.'
Follow-ups they push on- What's MongoDB's document size limit, and how does it force referencing?
- How would you model a comments-on-posts relationship?
Red flag Reflexively normalizing like a relational schema, or embedding an unbounded growing array that eventually hits the 16MB document limit.
source: MongoDB — Data Modeling Introduction ↗ -
What is BASE and how does it differ from ACID?
BASE = Basically Available, Soft state, Eventual consistency. It's the consistency model many NoSQL/distributed stores choose: stay available and partition-tolerant, accept that replicas converge *eventually* rather than being instantly consistent.
Vs ACID, which insists every transaction leaves the DB strongly consistent and isolated. BASE relaxes that to gain availability and horizontal scale. It's the practical face of the CAP theorem: under a network partition you pick availability (BASE/AP) or consistency (ACID/CP). Use BASE where stale-by-seconds reads are fine (feeds, product views); use ACID where they aren't (payments).
Follow-ups they push on- State the CAP theorem and which corner BASE sits in.
- Give a feature where eventual consistency is unacceptable.
Red flag Equating 'NoSQL' with 'no transactions' — many (MongoDB, DynamoDB) now offer ACID transactions; BASE is a choice, not an inherent limitation.
source: MongoDB — ACID Transactions / Database Consistency ↗ -
Explain sharding vs replication vs partitioning. How are they different?
Replication — keep copies of the same data on multiple nodes (leader-follower). Goal: high availability + read scaling + durability. It does *not* increase write capacity (one leader takes writes).
Sharding — split the dataset into disjoint pieces across nodes by a shard key, each node owning a subset. Goal: scale writes and storage beyond one machine.
Partitioning — the general term for splitting a table: horizontal = rows split across partitions (sharding is horizontal partitioning across servers); vertical = columns split into separate tables.
In practice you combine them: shard for write scale, then replicate each shard for HA.
Follow-ups they push on- Why does replication alone not scale writes?
- How do you choose a shard key, and what's a hot-shard / hotspot?
Red flag Using sharding and replication interchangeably — replicas are full copies (HA + reads), shards are disjoint subsets (write/storage scale).
source: MongoDB — Sharding ↗ -
How would you choose a shard key, and what goes wrong with a bad one?
A good shard key has high cardinality, even write distribution, and matches your query pattern so most queries hit one shard (targeted, not scatter-gather).
Failure modes: a monotonically increasing key (timestamp, auto-increment id) sends all new writes to one shard — a 'hot shard'. A low-cardinality key (country, status) can't split finely enough. A key that doesn't appear in queries forces every query to fan out to all shards (broadcast).
Mitigations: hashed shard keys to spread writes, or compound keys (e.g.
user_id+ time) to keep related data together while distributing load.Follow-ups they push on- Why is a hashed shard key better for write distribution but worse for range queries?
- What is a scatter-gather query and why is it slow?
Red flag Picking an auto-increment or timestamp shard key and creating a permanent hotspot, or a key absent from common queries forcing broadcasts.
source: MongoDB — Choose a Shard Key ↗ -
What is the MongoDB aggregation pipeline, and how does it map to SQL?
The aggregation pipeline passes documents through ordered stages, each transforming the stream and feeding the next — like Unix pipes for data.
Rough SQL mapping:
$match~ WHERE,$group~ GROUP BY (+ aggregates),$project~ SELECT (shape columns),$sort~ ORDER BY,$limit/$skip~ LIMIT/OFFSET,$lookup~ LEFT JOIN,$unwind~ flatten an array into rows.Stage order matters for performance: put
$matchand$sortearly so they can use indexes and shrink the working set before expensive$group/$lookup.Follow-ups they push on- Why put $match as early as possible in the pipeline?
- What does $unwind do and when is it needed before $group?
Red flag Ordering stages so `$match` comes after a `$group`/`$lookup`, defeating index use and processing far more documents than necessary.
source: MongoDB — Aggregation Pipeline ↗ -
Why use Redis for caching, and what are the main eviction/expiry concerns?
Redis is an in-memory key-value store, so reads/writes are microsecond-fast — ideal as a cache in front of a slower primary DB, plus sessions, rate limiters, and leaderboards (sorted sets).
Key concerns: set a TTL (
EXPIRE) so stale data ages out; pick an eviction policy for when memory is full (allkeys-lru,allkeys-lfu,volatile-ttl, etc.); and have a cache-invalidation strategy on writes (write-through, or delete-on-update). Watch for stampede — many requests recomputing a hot key the instant it expires — mitigated by locks or jittered TTLs.Follow-ups they push on- Cache-aside vs write-through vs write-behind?
- What is a cache stampede / thundering herd, and how do you avoid it?
Red flag Caching without a TTL or invalidation plan (serving stale data forever), or ignoring eviction so the cache silently drops keys under memory pressure.
source: Redis — Key eviction (docs) ↗ -
When would you NOT use NoSQL — i.e., when is a relational database still the right call?
Choose relational when you need strong multi-row transactions / ACID (money, inventory, bookings), flexible ad-hoc queries and joins across well-structured related data, constraints and referential integrity enforced by the DB, and a stable schema.
NoSQL earns its place for huge scale on a known access pattern, flexible/evolving document shapes, or relationship-traversal workloads. The honest senior answer is 'it depends on access patterns and consistency needs' — and modern Postgres (JSONB, partitioning, logical replication) covers many cases people reach for NoSQL for.
Follow-ups they push on- How does Postgres JSONB blur the SQL/NoSQL line?
- Polyglot persistence — when is mixing both justified?
Red flag Picking NoSQL for hype/scale you don't have, then reimplementing joins and transactions in application code; or assuming relational 'can't scale'.
source: MongoDB — NoSQL vs SQL Databases ↗ -
State the CAP theorem and explain why 'CA' isn't a real choice for a distributed database.
CAP says that when a network partition (P) happens, a distributed system can preserve at most one of Consistency (every read sees the latest write) and Availability (every request gets a non-error response) — you must drop one.
'CA' isn't a meaningful pick because partitions *will* happen in any real network — you don't get to opt out of P. So the real choice during a partition is CP (refuse/error to stay consistent — e.g. a leader-based store rejecting writes it can't replicate) or AP (answer with possibly-stale data and reconcile later — Dynamo-style stores). When there's *no* partition, a good system gives both C and A; CAP only forces the trade *during* a partition. PACELC extends it: else (no partition) you still trade latency vs consistency.
What a strong answer coversUnder a partition you choose Consistency or Availability, not both.
Partitions are unavoidable in real networks, so P isn't optional — 'CA' is a non-choice.
CP = stay consistent, reject/err during partition; AP = stay available, serve stale, reconcile.
CAP only bites *during* a partition; PACELC adds the latency-vs-consistency trade for normal operation.
Quick self-checkDuring a network partition, a payment system that must never double-charge should behave as…
-
Risky for payments — serving/accepting under partition can produce conflicting writes (double charges) to reconcile.
-
Correct — for financial correctness you sacrifice availability during the partition to preserve consistency.
-
Wrong — CA isn't achievable in a distributed system; you can't opt out of partitions.
-
Wrong — CAP applies to any distributed data store, SQL or NoSQL.
Follow-ups they push on- What does PACELC add to CAP?
- Give a real CP store and a real AP store and the workload each suits.
Red flag Treating CAP as 'pick any two' (you can't drop P) or thinking it forces a permanent global trade rather than one that only applies during a partition.
source: Wikipedia — CAP theorem ↗ -
Why is NoSQL data modeling driven by access patterns, and what does DynamoDB single-table design illustrate?
Relational modeling normalizes by entity and joins at read time. NoSQL stores (especially DynamoDB) have no joins and charge for every access, so you model queries first: list the access patterns, then design keys so each query is a single, indexed key lookup.
Single-table design takes this to the extreme — multiple entity types (users, orders, items) share one table, distinguished by a composite primary key (a generic partition key + sort key, often
PK/SKwith prefixes likeUSER#123/ORDER#456). Related items share a partition so one query fetches them together without a join, and secondary indexes (GSIs) serve alternate patterns. The cost is a rigid, query-specific schema that's painful to change when access patterns evolve.What a strong answer coversNo joins + per-request cost -> design around queries, not entities.
List access patterns first, then shape partition/sort keys so each is one key lookup.
Single-table design co-locates related items in a partition via prefixed composite keys.
Secondary indexes (GSIs) add alternate access patterns; the schema is rigid to new ones.
Follow-ups they push on- How does a composite (partition + sort) key let one query return several related items?
- What's the downside when a brand-new access pattern appears later?
Red flag Modeling a NoSQL store like a normalized relational schema and then needing joins the database can't do, forcing N round-trips or client-side joins.
source: AWS docs — DynamoDB single-table design / data modeling ↗ -
What is eventual consistency, and how do read-your-writes and quorum reads/writes fit in?
Eventual consistency: replicas may temporarily disagree, but with no new writes they all converge to the same value. The window means a read just after a write can return stale data.
Stronger guarantees layer on top. Read-your-writes ensures *you* always see your own latest write (route your reads to a replica known to have it, or to the leader). Quorum tunes consistency per operation: with N replicas, require W acks on write and R replicas on read; if R + W > N the read and write sets overlap, so a read is guaranteed to see the latest acknowledged write (strong consistency) — at the cost of latency/availability. Dynamo-style systems expose W/R so you trade consistency against speed per call.
What a strong answer coversEventual consistency: replicas converge once writes stop; reads can be briefly stale.
Read-your-writes: a session always sees its own latest write.
Quorum: pick W (write acks) and R (read replicas) out of N.
R + W > Nguarantees overlap -> a read sees the latest committed write (strong consistency).Higher R/W means stronger consistency but more latency and less availability.
Quick self-checkWith N=3 replicas, which (W, R) configuration guarantees strongly-consistent reads?
-
No — R + W = 2, not > 3; the read and write sets may not overlap, so reads can be stale.
-
Correct — R + W = 4 > 3, so the read set and write set must share at least one replica with the latest write.
-
No — R + W = 3, which is not strictly greater than N=3; overlap isn't guaranteed.
-
Misleading — R=3 with W=1 does give R+W=4>3, but the framing 'only if' is wrong; the quorum math is what guarantees it, and this option understates W's role.
Follow-ups they push on- Why does R + W > N guarantee a read sees the newest write?
- What's a tunable example — W=N for strong writes vs W=1 for fast writes?
Red flag Assuming eventual consistency means 'never consistent', or thinking any single quorum value is right — R/W are a per-workload latency-vs-consistency dial.
source: Wikipedia — Eventual consistency ↗ -
When is a graph database the right tool, and why does it beat relational recursive joins for deep traversals?
Use a graph DB (Neo4j) when relationships are first-class and traversals are deep/variable-length: social graphs ('friends of friends of friends'), fraud rings, recommendation paths, dependency/permission graphs.
In a relational store, each 'hop' is another self-join, and a 4-hop query means 4 joins whose cost compounds with table size — the optimizer re-finds matching rows by index lookup each level. A graph DB uses index-free adjacency: each node directly stores pointers to its neighbors, so traversing one more hop is O(neighbors of the current node), independent of total graph size. That makes variable-depth path queries (shortest path, reachability) both fast and natural to express (Cypher's
MATCH (a)-[:FRIEND*1..4]->(b)).What a strong answer coversGraph DBs shine when relationships and multi-hop traversal are the core workload.
Index-free adjacency: nodes point straight at neighbors, so a hop is local, not a global index lookup.
Relational deep traversal = N self-joins whose cost compounds with table size.
Variable-length paths (shortest path, reachability) are awkward in SQL, native in graph query languages.
Follow-ups they push on- What is index-free adjacency, concretely?
- Could a recursive CTE handle this in SQL, and where does it fall down at scale?
Red flag Forcing a deeply-connected, variable-depth traversal into repeated SQL self-joins/recursive CTEs and watching it degrade as hop count and table size grow.
source: Neo4j — Graph Database Concepts (index-free adjacency) ↗ -
Cache-aside vs write-through vs write-behind — compare the caching strategies.
Cache-aside (lazy): the app checks the cache; on a miss it reads the DB and populates the cache, and on writes it updates the DB and *invalidates* the key. Simple and resilient (cache down ≠ data loss), but the first read after a miss/eviction is slow and there's a brief staleness window.
Write-through: writes go to cache and DB synchronously, so the cache is always fresh — at the cost of higher write latency and caching data that may never be read.
Write-behind (write-back): writes hit the cache and are flushed to the DB asynchronously — lowest write latency, highest throughput, but risks data loss if the cache fails before flushing and adds complexity. Cache-aside is the common default for read-heavy web workloads.
What a strong answer coversCache-aside: app-managed, populate on miss, invalidate on write — simple, resilient, can serve stale briefly.
Write-through: write cache+DB together — always fresh, slower writes, may cache unread data.
Write-behind: async flush to DB — fastest writes, but risks loss on cache failure.
Default to cache-aside for read-heavy systems; reserve write-behind for write-heavy, loss-tolerant cases.
Quick self-checkWhich strategy has the **lowest write latency** but the **highest risk of data loss**?
-
No — it writes to the DB directly (then invalidates), so no loss, but it's not the lowest write latency.
-
No — it writes cache and DB synchronously, so writes are durable but slower, not the fastest.
-
Correct — it acks the write from cache and flushes to the DB asynchronously: fastest writes, but unflushed data is lost if the cache fails.
-
No — read-through governs cache population on reads, not write latency.
Follow-ups they push on- Why does write-behind risk data loss, and how do you mitigate it?
- How do you avoid a cache stampede when a hot cache-aside key expires?
Red flag Choosing write-behind for data you can't afford to lose, or running cache-aside without an invalidation step so the cache serves stale data after every update.
source: AWS — Caching strategies (lazy loading / write-through) ↗ -
A NoSQL store is 'schemaless' — what does that actually mean, and what's the catch?
'Schemaless' means the database doesn't enforce a fixed schema — different documents in a collection can have different fields, and you can add a field without a migration. It's better called schema-on-read: the structure is interpreted by the application when it reads, rather than enforced by the database on write.
The catch is the schema doesn't disappear — it moves into your application code, which must handle missing fields, mixed types, and old document shapes (versioning) forever. Without DB-level constraints you can silently write inconsistent data, so mature NoSQL stores add optional validation (MongoDB JSON Schema validators) and teams still enforce structure in code. 'Flexible' is the upside; 'no guardrails' is the downside.
What a strong answer coversSchemaless = the DB doesn't enforce structure; really 'schema-on-read'.
The schema moves into application code, which must tolerate missing/old/variant shapes.
Flexibility speeds iteration but removes the DB's data-integrity guardrails.
Mitigate with optional validators (MongoDB schema validation) and explicit document versioning.
Quick self-checkA 'schemaless' document store most accurately means…
-
No — the data still has structure; it's enforced/interpreted by the app on read, not by the DB on write.
-
Correct — 'schemaless' is schema-on-read; the burden moves to application code.
-
No — the point is the DB does *not* enforce a fixed schema; documents can vary.
-
No — it stores rich structured documents; it just doesn't impose one uniform schema.
Follow-ups they push on- How does schema-on-read differ from schema-on-write?
- How do you evolve millions of existing documents to a new shape?
Red flag Believing 'schemaless' means no schema to manage — the schema is just enforced (or not) in application code, where inconsistencies accumulate silently.
source: MongoDB — Schema Validation ↗