Trees
BSTs, self-balancing trees, B/B+ trees (why DB indexes use them), tries, and heaps. The B-tree item directly feeds the Databases indexes chapter.
Trees give you ordering and hierarchy that flat structures can’t. You rarely implement one by hand, but you reach for the idea constantly — and one of them, the B-tree, is literally why your database queries are fast. The unifying mental model: a tree lets you discard half (or more) of the remaining data at every step, which is how O(n) searches become O(log n). The skill here is recognizing which tree solves which problem.
BSTs and why balance matters
A binary search tree keeps keys ordered: every left child is smaller, every right child larger. That invariant turns search, insert, and delete into a walk down the tree — O(log n) when the tree is balanced. The catch is the qualifier. Insert already-sorted keys into a naive BST and it degenerates into a linked list: one long chain, O(n) for everything. A plain BST gives you ordering but no performance guarantee.
Self-balancing trees fix that. Red-black and AVL trees re-balance themselves on every insert/delete (via rotations) to keep height O(log n), guaranteeing log-time operations regardless of insertion order. This is what backs real ordered maps and sets — Java’s TreeMap, C++‘s std::map, Rust’s BTreeMap — and the Linux process scheduler (the old CFS used a red-black tree). Reach for one when you need keys kept in sorted order and guaranteed log-time ops; a hash map is faster on average but throws ordering away.
B-trees: why database indexes use them
A B-tree (and its variant, the B+ tree) is a balanced tree built for disk, not RAM. Each node holds many keys and has high fanout — hundreds of children — so the tree stays extremely shallow. A billion rows fit in three or four levels, meaning a lookup costs three or four block reads. Since disk (or SSD) I/O is orders of magnitude slower than comparing keys in memory, minimizing reads is the whole game, and that’s exactly why relational database indexes are B-trees.
The B+ tree refinement: all actual data/pointers live in the leaf nodes, and those leaves are linked together in order. That makes range scans and ORDER BY cheap — find the start leaf, then walk the linked leaves sequentially instead of re-traversing from the root for each row.
Tries and heaps
A trie (prefix tree) keys data by the path of characters, so lookup time depends on key length, not the number of stored keys. Searching for “cat” walks c → a → t regardless of whether the trie holds ten words or ten million. That makes it ideal for autocomplete, spell-checking dictionaries, and IP-routing tables (longest-prefix match) — anything that’s fundamentally about prefixes.
A heap is a different beast: it only maintains the min (or max) at the root, not full ordering. That buys O(1) peek and O(log n) insert/extract — the perfect engine for a priority queue. It’s usually stored as a flat array (parent at i, children at 2i+1 / 2i+2), so it’s also cache-friendly.
To find the K largest items in a stream of n, don’t sort all n (O(n log n)) — keep a min-heap of size K. Each new item is compared to the smallest kept; the heap stays at K.
// Keep the K largest seen so far in a min-heap of size K.
// For each of n items:
// if heap.size < K -> push O(log K)
// else if item > heap.peek() -> pop min, push item O(log K)
// Total: O(n log K), space O(K) — beats sorting when K << n.The same heap pattern drives Dijkstra’s shortest path (always expand the nearest unvisited node), task scheduling by priority, event simulation, and merging K sorted lists. The cue is “I need the next smallest/largest repeatedly, but not a full sort.”
| Tree | Reach for it when | Avoid when | Big-O |
|---|---|---|---|
| Self-balancing BST | sorted keys + guaranteed log-time ops | you only need point lookups (use a hash map) | search/insert/delete O(log n) |
| B+ tree | on-disk index; equality and range/ORDER BY | small in-memory data (overkill) | O(log n), few disk I/Os |
| Trie | prefix lookups, autocomplete, routing | you need whole-key exact match only | O(key length) |
| Heap | top-K, scheduling, Dijkstra, merge-K | you need to search arbitrary elements | peek O(1), insert/extract O(log n) |
Union-Find (disjoint-set)
One more tree-shaped structure earns its own mention because it answers a question the
others can’t cheaply: “are these two things in the same group?” as groups keep
merging. A union-find (a.k.a. disjoint-set union, DSU) maintains a partition of
elements into non-overlapping sets via a forest — each element points at a parent, and
the root is the set’s identity. Two operations: find(x) returns x’s root (which set),
and union(a, b) merges two sets.
Two optimizations make it astonishingly fast: union by rank/size (attach the smaller
tree under the bigger one) and path compression (on find, point each visited node
straight at the root). Together they give an amortized cost of O(α(n)) — the inverse
Ackermann function, which is ≤ 4 for any input you’ll ever see. Effectively constant.
class DSU {
constructor(n) {
this.parent = Array.from({ length: n }, (_, i) => i); // each node is its own set
this.rank = Array(n).fill(0);
}
find(x) { // path compression: flatten as we go
while (this.parent[x] !== x) x = this.parent[x] = this.parent[this.parent[x]];
return x;
}
union(a, b) { // union by rank
let ra = this.find(a), rb = this.find(b);
if (ra === rb) return false; // already connected → (in Kruskal) a cycle
if (this.rank[ra] < this.rank[rb]) [ra, rb] = [rb, ra];
this.parent[rb] = ra;
if (this.rank[ra] === this.rank[rb]) this.rank[ra]++;
return true;
}
}Reach for union-find when connectivity is dynamic and you have many queries:
detecting a cycle while adding edges to an undirected graph (a union that returns
false means both endpoints were already connected), grouping connected components, and
Kruskal’s minimum-spanning-tree algorithm (add the cheapest edge that doesn’t form a
cycle). When the graph is fixed and you ask once, a plain BFS/DFS (chapter 1.5) is
simpler; union-find wins when the structure grows and you keep asking “connected yet?”
01 Learning objectives
0 / 6 done02 Curated reading
03 Knowledge check
- 01easy
An unbalanced BST degrades search to:
- 02medium
Why do relational database indexes use B-trees / B+ trees?
- 03medium
You need the top-K largest items from a stream. Best structure?
- 04medium
You keep adding edges and repeatedly ask “are these two nodes connected yet?” Best tool?
04 Interview questions
browse all ↗What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.
-
What property defines a binary search tree, and what are its operation costs when balanced vs degenerate?
In a BST every node's left subtree holds only smaller keys and its right subtree only larger keys, so an in-order traversal yields sorted order. Search/insert/delete are O(log n) when the tree is balanced (height ~log n) but degrade to O(n) when it degenerates into a linked-list shape (e.g. inserting already-sorted data). That fragility is exactly why self-balancing variants exist.
Follow-ups they push on- What insertion order produces a degenerate BST?
- How do you validate that a tree is a proper BST?
Red flag Claiming a BST is always O(log n) without the 'when balanced' qualifier.
source: GeeksforGeeks — Self-Balancing Binary Search Trees ↗ -
Validate that a binary tree is a valid binary search tree.
Recurse with a valid (min, max) range for each node: the root is unbounded, the left child tightens the max to the parent's value and the right child tightens the min. A node fails if its value violates its range. O(n) time, O(h) stack space. Equivalently, an in-order traversal of a valid BST is strictly increasing, so you can check that the previous visited value is always smaller.
Follow-ups they push on- Why isn't it enough to just compare each node to its two children?
Red flag Only comparing a node against its immediate children, which misses violations deeper in a subtree.
source: LeetCode 98 — Validate Binary Search Tree (company tags) ↗ -
What is a self-balancing tree (AVL / red-black), and where are they used in real systems?
Self-balancing BSTs perform rotations on insert/delete to keep height O(log n), guaranteeing O(log n) operations regardless of input order. AVL trees keep height balance tighter (faster lookups, more rotations); red-black trees balance more loosely (fewer rotations, faster writes). They back ordered maps/sets such as Java's TreeMap and C++ std::map, and red-black trees appear in the Linux process scheduler.
Follow-ups they push on- When would you prefer AVL's stricter balance over red-black?
Red flag Treating AVL and red-black as identical, or not knowing they guarantee O(log n) by construction.
source: AlgoCademy — Introduction to Self-Balancing BSTs ↗ -
Why do relational databases use B-trees / B+ trees for indexes instead of a binary search tree?
B-trees are shallow and high-fanout — each node holds many keys, so the tree stays only a few levels deep even for millions of rows, which minimizes expensive disk seeks (disk I/O, not comparisons, is the bottleneck). A binary tree would be far taller and cost many more page reads. In a B+ tree all values live in the leaves, which are linked together, so range scans and ORDER BY can sweep the leaves sequentially without re-walking the tree.
Follow-ups they push on- Why does fanout matter more than tree height in comparisons?
- How does the linked leaf layer of a B+ tree help range queries?
Red flag Justifying B-trees by comparison count rather than by minimizing disk page reads.
source: Use The Index, Luke — Anatomy of an Index (B-tree) ↗ -
Implement a trie (prefix tree) supporting insert, search, and startsWith.
Each node holds a map/array of child links and an
isEndflag; insert walks/creates a path of nodes one character at a time, search walks the path and checksisEnd, and startsWith walks the path without requiringisEnd. All three are O(m) for a word of length m, independent of how many words are stored. Tries shine for autocomplete, spellcheck, and prefix-heavy lookups where a hash map can't answer prefix queries.Follow-ups they push on- Add wildcard '.' matching.
- How would you support delete?
Red flag Conflating 'a word ends here' (`isEnd`) with 'a prefix exists here', which breaks exact-word search.
source: LeetCode 208 — Implement Trie (company tags) ↗ -
What is a heap / priority queue, and what are the costs of peek, insert, and extract?
A binary heap is a complete tree (stored in an array) maintaining the heap property — each parent is <= (min-heap) or >= (max-heap) its children — so the extreme element sits at the root. Peek-min/max is O(1); insert and extract are O(log n) because you sift up/down one level at a time. It's the go-to for top-K, scheduling, Dijkstra, and merging K sorted streams.
Follow-ups they push on- Why is building a heap from n items O(n) and not O(n log n)?
Red flag Confusing a heap with a BST, or thinking it keeps all elements fully sorted (it only orders the root).
source: CodeJeet — Heap / Priority Queue Interview Questions ↗ -
Find the kth largest element in an unsorted array.
Maintain a min-heap of size k: push each element, and whenever the heap exceeds k pop the smallest, so the heap ends holding the k largest with the kth largest at its root. That's O(n log k) time, O(k) space. Quickselect gives average O(n) by partitioning around a pivot and recursing into only the relevant side, with O(n^2) worst case — mention both and the trade-off.
Follow-ups they push on- When is quickselect's O(n) average worth its O(n^2) worst case?
Red flag Sorting the whole array (O(n log n)) and indexing, or using a max-heap of size n when a size-k min-heap suffices.
source: LeetCode 215 — Kth Largest Element in an Array (company tags) ↗ -
Merge k sorted linked lists into one sorted list.
Push the head of each list into a min-heap keyed by node value; repeatedly pop the smallest, append it to the result, and push that node's successor. Each of the n total nodes is pushed/popped once at O(log k) cost, giving O(n log k) time and O(k) heap space. Divide-and-conquer pairwise merging hits the same O(n log k) without a heap.
Follow-ups they push on- Compare the heap approach with pairwise divide-and-conquer merging.
Red flag Concatenating all lists and sorting (O(n log n)) instead of exploiting that each list is already sorted.
source: LeetCode 23 — Merge k Sorted Lists (company tags) ↗ -
Compare the four binary-tree traversals (preorder, inorder, postorder, level-order) and say when you'd use each.
Preorder (node, left, right) visits the root first — good for copying/serializing a tree. Inorder (left, node, right) yields sorted order in a BST — good for validation and producing ordered output. Postorder (left, right, node) visits children before the parent — good for deletion and bottom-up aggregates like subtree sums/heights. Level-order is BFS with a queue, processing tier by tier — good for shortest-depth and 'by level' problems. The first three are DFS (recursion or stack); level-order is BFS (queue).
What a strong answer coversPreorder: serialize/clone (root before children).
Inorder: BST ⇒ sorted output; used for validation.
Postorder: delete / bottom-up subtree aggregates.
Level-order: BFS via queue; depth and per-level problems.
Quick self-checkWhich traversal of a valid BST produces the keys in ascending sorted order?
-
Wrong — preorder visits the root before its left subtree, so it isn't sorted.
-
Correct — left, node, right yields ascending order in a BST.
-
Wrong — postorder visits the root last, not in sorted position.
-
Wrong — BFS order reflects depth, not key ordering.
Follow-ups they push on- Which traversal reconstructs a BST's sorted sequence?
- Why is postorder natural for freeing/deleting a tree?
Red flag Mixing up the visit positions, or using a stack for level-order instead of a queue (that's DFS, not BFS).
source: GeeksforGeeks — Tree Traversals (Inorder, Preorder, Postorder) ↗ -
Find the lowest common ancestor (LCA) of two nodes in a binary tree.
Recurse: if the current node is null or equals either target, return it; otherwise recurse left and right. If both sides return non-null, the current node is the LCA (the targets split here); if only one side does, propagate that side up. O(n) time, O(h) stack space. If it's specifically a BST, you can do better: walk down, going left when both targets are smaller and right when both are larger — the first node that splits them is the LCA, O(h) time.
What a strong answer coversGeneral tree: both subtrees return non-null ⇒ this node is the LCA.
Return the non-null side upward when only one target is found below.
O(n) time, O(h) stack for the general case.
BST shortcut: descend by comparing values, first split node is the LCA.
Follow-ups they push on- How does the BST version beat the general O(n) approach?
- What changes if each node also stores a parent pointer?
Red flag Assuming both targets actually exist in the tree, or applying the BST descent on a non-BST.
source: LeetCode 236 — Lowest Common Ancestor of a Binary Tree (company tags) ↗ -
Return the level-order traversal of a binary tree (values grouped by level).
Run BFS with a queue: at each step record the current queue size (that's one full level), then dequeue exactly that many nodes, collect their values into a level list, and enqueue their children. Repeat until the queue empties. O(n) time, O(width) space. Snapshotting the queue size per round is the trick that cleanly separates one level from the next.
What a strong answer coversBFS with a queue; snapshot the level size each round.
Process exactly that many nodes to isolate one level.
Enqueue children as you go for the next level.
O(n) time, O(max width) space.
Follow-ups they push on- Produce a zigzag (alternating left-right) level order.
- Return only the rightmost node of each level (right side view).
Red flag Not capturing the level size before the loop, so children enqueued mid-level bleed into the current level.
source: LeetCode 102 — Binary Tree Level Order Traversal (company tags) ↗ -
Compute the diameter of a binary tree (longest path between any two nodes).
Do a single postorder DFS that returns each node's height while updating a global max: at each node, the longest path *through* it is
leftHeight + rightHeight(in edges), so track the maximum of that across all nodes and return1 + max(leftHeight, rightHeight)to the parent. O(n) time, O(h) stack. The key insight is that the answer is a path that bends at some node, computed from its two subtree depths.What a strong answer coversPostorder DFS returns height; a side variable tracks the best diameter.
Path through a node =
leftHeight + rightHeight(edge count).Return
1 + max(left, right)upward as the node's height.O(n) time, O(h) stack — one traversal, not one per node.
Follow-ups they push on- Why compute height and diameter in the same pass instead of two?
- Should the diameter be measured in nodes or edges (be consistent)?
Red flag Recomputing height separately at every node (O(n^2)) instead of folding it into one postorder pass.
source: LeetCode 543 — Diameter of Binary Tree (company tags) ↗ -
Why is building a heap from n elements O(n) and not O(n log n)? And how do you do an in-place heapsort?
Bottom-up heapify (sift-down from the last internal node up to the root) is O(n), not O(n log n), because most nodes sit near the leaves and sift down only a tiny distance — summing the work weighted by height gives a convergent series bounded by O(n). (Inserting one-by-one with sift-up is the O(n log n) way.) Heapsort then builds a max-heap in place, repeatedly swaps the root (the max) with the last unsorted element and sifts down the reduced heap — O(n log n) time, O(1) extra space, but not stable.
What a strong answer coversBottom-up heapify is O(n): most nodes are shallow, work sums to O(n).
Repeated sift-up inserts would be O(n log n) — the slower build.
Heapsort: build max-heap, swap root to the end, shrink, sift down.
Heapsort is O(n log n), O(1) space, not stable.
Quick self-checkWhy is bottom-up heap construction O(n) rather than O(n log n)?
-
Wrong — a sift-down can be up to O(log n); the savings come from the height distribution.
-
Correct — the cost is Σ (nodes at height h)·h, which sums to O(n).
-
Wrong — sift-down does compare; the bound is about how far nodes move.
-
Wrong — bottom-up heapify eagerly fixes every subtree.
Follow-ups they push on- Why is sift-down-from-the-bottom cheaper than n separate insertions?
- Why isn't heapsort stable, and when does that matter?
Red flag Claiming heap construction is always O(n log n), conflating the build phase with n individual insertions.
source: GeeksforGeeks — Time Complexity of Building a Heap ↗ -
What advantage does a trie have over a hash map for storing strings, and what's the catch?
A trie answers prefix queries — 'all words starting with "pre"', autocomplete, longest-prefix matching — which a hash map cannot do without scanning every key, and it shares storage for common prefixes. Lookups are O(m) in the word length, independent of how many words are stored. The catch is memory: each node carries a child map/array (up to alphabet size), so a sparse trie can use far more memory than a hash set of the same words, and it's only worthwhile when prefix operations matter.
What a strong answer coversTrie supports prefix / autocomplete queries; a hash map can't, cheaply.
Lookup is O(m) in word length, not in the number of stored words.
Common prefixes are shared, but each node holds child links.
Catch: high memory overhead; use only when prefixes matter.
Quick self-checkWhat can a trie do that a hash map of the same words fundamentally cannot do efficiently?
-
Wrong — a hash map already does exact lookup in average O(1).
-
Correct — prefix enumeration walks a subtree; a hash map would have to scan all keys.
-
Wrong — tries usually use more memory due to per-node child links.
-
Wrong — neither structure stores duplicate keys by design.
Follow-ups they push on- How would you compress a sparse trie (radix/Patricia trie)?
- When is a plain hash set strictly better than a trie?
Red flag Reaching for a trie when only exact-match lookup is needed — a hash set is simpler and lighter there.
source: GeeksforGeeks — Trie Data Structure ↗ -
Find the kth smallest element in a binary search tree.
Do an inorder traversal (which visits BST keys in ascending order) and stop at the kth visited node — you don't need to traverse the whole tree. An iterative inorder with an explicit stack lets you halt early at O(h + k) time. If the tree is queried for many different k values, augment each node with its left-subtree size so each query becomes O(h) by navigating directly.
What a strong answer coversInorder visits BST keys ascending ⇒ the kth visited is the answer.
Stop early at the kth node; no full traversal needed.
Iterative stack-based inorder ⇒ O(h + k) time.
For repeated queries, store subtree sizes ⇒ O(h) per query.
Follow-ups they push on- How do subtree-size augmentations speed up many repeated queries?
- How would you find the kth largest instead?
Red flag Collecting the entire inorder list and indexing (O(n)) instead of stopping at the kth element.
source: LeetCode 230 — Kth Smallest Element in a BST (company tags) ↗