Hashing structures
Hash maps/sets give average O(1) by key. Understand collisions, load factor/rehash, and when a hashmap is the wrong tool (ordering / range queries).
The hash map is the workhorse of everyday code: it turns “find the value for this key” from an O(n) scan into an average O(1) lookup. The mental model is a coat check — you hand over a coat (the key), a function stamps it with a ticket number (the bucket), and retrieval jumps straight to that slot instead of searching the whole rack. Understanding how it gets that speed also tells you exactly when it betrays you.
How average O(1) happens
A hash map runs the key through a hash function that produces a number, then maps that number to a slot (bucket) in an array, usually with hash(key) % numBuckets. Because the slot is computed directly, insert, lookup, and delete don’t search — they jump straight to the bucket, giving average O(1). A set is the same machinery storing only keys: membership and uniqueness in average O(1), ideal for dedup and “have I seen this before?” checks.
The “average” qualifier is load-bearing. Two distinct keys can hash to the same bucket — a collision — and how the map resolves those, plus how full it gets, determines whether you actually keep O(1).
Collisions and load factor
There are two classic ways to handle collisions. Separate chaining stores each bucket as a small list (or, in modern Java’s HashMap, a tree once a bucket gets long): colliding keys append to that bucket’s list. Open addressing keeps everything in the array itself and, on collision, probes for the next free slot (linear probing, quadratic probing, double hashing). Either way, the worst case is the same: if many keys collide into one bucket, lookups walk that bucket and degrade to O(n).
What keeps buckets short is the load factor — entries divided by buckets. When it crosses a threshold (commonly ~0.75), the table allocates a larger bucket array and rehashes every entry into it. That resize is an occasional O(n) cost, but it’s amortized away, keeping the average O(1). Two things break this: a bad hash function that clusters keys, and adversarial keys deliberately chosen to collide.
| Strategy | How it resolves a collision | Strength | Weakness |
|---|---|---|---|
| Separate chaining | append to the bucket's list/tree | simple; tolerates high load factor | extra pointers; cache-unfriendly chains |
| Open addressing | probe to the next free slot in the array | cache-friendly; no per-entry pointers | clusters; degrades sharply near full |
Counting occurrences is the textbook hash-map win: one pass, O(1) per key, O(n) total — versus O(n^2) if you re-scanned for each element.
function topWord(words) {
const counts = new Map(); // key → frequency
for (const w of words) {
counts.set(w, (counts.get(w) ?? 0) + 1); // average O(1) per word
}
let best = null, max = 0;
for (const [w, c] of counts) {
if (c > max) { max = c; best = w; }
}
return best; // O(n) total
}The same shape covers dedup (use a Set), grouping (key → list), and memoizing results by argument — any time you find yourself re-scanning to answer “have I seen this / how many?”.
When a hash map is the wrong tool
A hash map scatters keys across buckets by hash value, which means it has no order. The moment you need ordering or range queries, it’s the wrong structure — reach for a balanced tree (an ordered map, like Java’s TreeMap or C++‘s std::map) instead.
| Need | Use | Why |
|---|---|---|
| Lookup by exact key | Hash map / set | average O(1), no ordering needed |
| Dedup / membership | Set | O(1) “seen it?” checks |
| Keys in sorted order | Balanced tree (ordered map) | hash maps have no order |
| Range queries (keys in [a,b]) | Balanced tree | O(log n) range scan; hash can't range |
| Min / max / “next greater” | Balanced tree or heap | ordering the hash map throws away |
01 Learning objectives
0 / 5 done02 Knowledge check
- 01easy
Average-case lookup in a hash map is:
- 02medium
You need to keep keys in sorted order AND do range queries. A hash map is wrong because…
- 03medium
Exceeding a hash table's load factor triggers a resize/rehash.
03 Interview questions
browse all ↗What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.
-
How does a hash map achieve average O(1) lookup, and why is the worst case O(n)?
A hash function maps a key to a bucket index, so with a good hash and a reasonable load factor most buckets hold ~1 entry and lookup is average O(1). The worst case is O(n) when many keys collide into the same bucket (bad hash, adversarial keys, or everything hashing the same), degrading a bucket into a linear scan. The O(1) is therefore an expected/average bound, not a guarantee.
Follow-ups they push on- What makes a hash function 'good'?
- How can an attacker force the worst case (hash flooding)?
Red flag Stating O(1) as a hard worst-case guarantee instead of an average/expected one.
source: Hirist — Top HashMap Interview Questions ↗ -
Compare separate chaining and open addressing for collision handling.
Separate chaining stores colliding keys in a per-bucket list (or tree, as Java 8+ does past a threshold), so it tolerates high load factors but pays pointer/indirection overhead. Open addressing keeps everything in the array and probes for the next free slot (linear/quadratic probing, double hashing); it's cache-friendlier but degrades sharply as load factor approaches 1 and complicates deletion. The choice trades memory locality against sensitivity to load factor.
Follow-ups they push on- Why does deletion need tombstones in open addressing?
- Why does Java convert long chains into trees?
Red flag Describing chaining and open addressing as interchangeable without noting their load-factor and deletion behaviour differs.
source: GetSDEReady — HashMap & HashSet Interview Questions ↗ -
What is a load factor, and what happens when it's exceeded?
Load factor is entries divided by buckets — a measure of how full the table is (Java's HashMap defaults to 0.75). When it's exceeded the table resizes: capacity roughly doubles and every key is rehashed into the larger array, an O(n) operation that happens rarely, keeping amortized insert O(1). A higher load factor saves memory but raises collision rates and slows lookups; a lower one wastes space.
Follow-ups they push on- Why double the capacity rather than grow by a constant?
Red flag Thinking each insert that crosses the threshold is cheap, or that resize never happens.
source: Hirist — Top HashMap Interview Questions ↗ -
Given an array and a target, return indices of two numbers that sum to the target.
Walk the array once, and for each value
xcheck a hash map fortarget - x; if present you've found the pair, otherwise storex -> indexand continue. This is O(n) time, O(n) space — the canonical 'use a hash map to remember what you've seen' problem. The brute-force double loop is O(n^2); the hash map trades space for that speedup.Follow-ups they push on- What changes if the array is already sorted?
- How would you return all unique pairs (3Sum-style)?
Red flag Matching an element with itself by checking the map before inserting the current element incorrectly.
source: LeetCode 1 — Two Sum (company tags) ↗ -
Group a list of strings into anagrams.
Use a hash map keyed by a canonical form of each word and collect words sharing a key. The canonical key is either the sorted characters (O(k log k) per word) or a 26-length character-count signature (O(k) per word); the latter is faster for long strings. Total time is about O(n*k), space O(n*k). The trick the interviewer is probing is choosing a good collision-free key.
Follow-ups they push on- Which key is better when words are long, and why?
Red flag Comparing every pair of words for the anagram relation (O(n^2 * k)) instead of bucketing by a canonical key.
source: LeetCode 49 — Group Anagrams (company tags) ↗ -
Count the number of contiguous subarrays whose sum equals k.
Track a running prefix sum and a hash map of how many times each prefix sum has occurred; at each index, the count of subarrays ending here equals the number of earlier prefix sums equal to
prefixSum - k. Seed the map with{0: 1}to count subarrays starting at index 0. This is O(n) time, O(n) space, versus the O(n^2) brute force.Follow-ups they push on- Why must the map be seeded with prefix sum 0?
Red flag Forgetting the `{0:1}` seed, which drops every subarray that starts at index 0.
source: LeetCode 560 — Subarray Sum Equals K (company tags) ↗ -
When is a hash map the wrong data structure? What do you reach for instead?
A hash map gives no ordering, so it's wrong when you need sorted iteration, the min/max, or range queries ('all keys between A and B'). For those, use an ordered/tree-based map (red-black tree, like Java's TreeMap or C++ std::map) giving O(log n) ordered operations, or a heap when you only need the extreme. Hash maps shine for pure key lookup, dedup, and frequency counting.
Follow-ups they push on- What does a TreeMap give you that a HashMap can't?
Red flag Defaulting to a hash map for problems that need ordering or range scans and then bolting on a sort every query.
source: AlgoArk — Hash Map Patterns for Interviews ↗ -
Design a structure with insert, delete, and getRandom all in average O(1).
Combine a dynamic array (for O(1) random access by index) with a hash map from value to its index in the array. Insert appends and records the index; delete swaps the target with the last element, pops the tail, and fixes the moved element's index; getRandom picks a random array index. The swap-with-last trick is what keeps delete O(1) instead of O(n).
Follow-ups they push on- How do you support duplicate values?
Red flag Deleting by shifting the array (O(n)) instead of swapping the victim with the last element.
source: LeetCode 380 — Insert Delete GetRandom O(1) (company tags) ↗ -
Find the first non-repeating character in a string and return its index.
Make one pass to build a hash map (or 26-length array) of character counts, then a second pass over the string returning the index of the first character whose count is 1; return -1 if none. Two linear passes, O(n) time, O(1) space if the alphabet is fixed (at most 26/128 entries). The second pass must walk the original string order, not the map, because a map has no positional order.
What a strong answer coversPass 1: count frequencies in a map/array.
Pass 2: scan the string in order, return first index with count 1.
O(n) time; O(1) space for a fixed alphabet.
Iterate the string (ordered), not the map (unordered), in pass 2.
Follow-ups they push on- Why can't you find the answer by iterating the hash map directly?
- How would you support a streaming version where characters arrive over time?
Red flag Iterating the map instead of the string in pass 2 and returning a non-first unique character because maps lack order.
source: LeetCode 387 — First Unique Character in a String (company tags) ↗ -
Find the length of the longest consecutive sequence of integers in an unsorted array, in O(n).
Put every number in a hash set for O(1) membership, then for each number
xstart counting a run only ifx - 1is absent (so x is a sequence start); from such a start, extend x, x+1, x+2, … while present and track the longest. Starting only at run-beginnings means each number is visited O(1) times overall, giving O(n) time, O(n) space — beating the O(n log n) sort-then-scan.What a strong answer coversSet membership gives O(1) 'is this number present?' checks.
Only begin counting where
x-1is missing (a run start).That guard bounds total work to O(n), not O(n²).
Beats sorting (O(n log n)) by trading time for O(n) space.
Quick self-checkWhy is the algorithm O(n) and not O(n²) despite the inner while-loop?
-
Wrong — a single run can be long; the bound comes from where runs start.
-
Correct — the `x-1 absent` guard ensures each element is visited O(1) times across all runs.
-
Wrong — sorting would make it O(n log n); this approach avoids sorting.
-
Wrong — O(1) lookups alone don't prevent quadratic total work; the run-start guard does.
Follow-ups they push on- Why does the 'only start where x-1 is absent' check keep it O(n)?
- What if duplicates are present in the input?
Red flag Extending a run from every element (O(n^2)) instead of only from numbers that begin a run.
source: LeetCode 128 — Longest Consecutive Sequence (company tags) ↗ -
What makes a good hash function, and what is 'hash flooding' (algorithmic complexity attack)?
A good hash function distributes keys uniformly across buckets, is fast to compute, and is deterministic — minimizing collisions so buckets stay ~O(1). Hash flooding is a denial-of-service attack where an adversary crafts many keys that all hash to the same bucket, collapsing every lookup/insert to O(n) and the whole table to O(n^2) work — historically used to DoS web servers via crafted POST/query parameters. Defenses include per-process randomized/seeded hashing (SipHash) so an attacker can't predict the bucket, and converting long collision chains into balanced trees (Java 8+ does this).
What a strong answer coversGood hash: uniform, fast, deterministic ⇒ low collision rate.
Hash flooding forces worst-case collisions ⇒ O(n) ops, O(n²) total (DoS).
Defense 1: seeded/randomized hashing (e.g. SipHash) hides the mapping.
Defense 2: treeify long chains (O(n) → O(log n) within a bucket).
Follow-ups they push on- Why does a per-process random seed defeat the attack?
- How does treeifying long buckets bound the worst case?
Red flag Assuming worst-case collisions only happen by chance and ignoring that they can be deliberately induced.
source: GeeksforGeeks — Hash Functions and Hashing ↗ -
Why must objects used as hash-map keys be effectively immutable, and what is the equals/hashCode contract?
A hash map places a key in a bucket derived from its hash; if you mutate a key after insertion so its hash changes, the entry is now in the 'wrong' bucket and lookups silently fail to find it. The equals/hashCode contract is the rule that ties them together: if two objects are equal they must have the same hash code, and equal objects must stay equal — so keys should be immutable (or at least their hash-relevant fields must be). Override
hashCodewhenever you overrideequals, or hash-based collections break.What a strong answer coversBucket is chosen from the key's hash at insert time.
Mutating a key's hash-relevant fields strands the entry in the wrong bucket.
Contract: equal objects ⇒ equal hash codes (not vice-versa).
Override
equals⇒ you must overridehashCodetoo.
Quick self-checkYou override `equals` to compare two fields but leave the default `hashCode`. What breaks?
-
Wrong — the map first picks a bucket by hashCode, so equal objects may land in different buckets.
-
Correct — the contract is violated; lookup hashes to a different bucket than where the key sits.
-
Wrong — hash maps don't sort keys at all.
-
Wrong — the failure is correctness (missed lookups), not just performance.
Follow-ups they push on- What goes wrong if hashCode is constant for all keys?
- Why is using a mutable list as a key dangerous?
Red flag Overriding `equals` but not `hashCode` (or mutating a key in place), so lookups for present keys return nothing.
source: Oracle Java SE — Object.hashCode() contract ↗ -
When should you use a hash set vs a hash map?
A hash set stores keys only and answers 'have I seen this?' — use it for membership, dedup, and presence checks. A hash map stores key→value associations — use it when you also need data attached to each key (counts, indices, last-seen position). Both give average O(1) ops; a set is essentially a map whose values you don't care about. Reach for the map the moment you need to remember *something about* each key, not just *that* you saw it.
What a strong answer coversSet: membership / dedup / 'seen?' — keys only.
Map: key → value — counts, indices, metadata per key.
Both average O(1); a set is a valueless map.
Two Sum needs a map (value→index); 'contains duplicate' needs only a set.
Quick self-checkYou must return the index of a matching earlier element. Set or map?
-
Wrong — a set proves presence but stores no index to return.
-
Correct — you need to recall *where* you saw the value, which requires a stored value.
-
Wrong — only the map retains the index you must return.
-
Wrong — a hash map solves it in one O(n) pass without sorting.
Follow-ups they push on- Which would you use for Two Sum, and why not the other?
- Which for 'does this array contain any duplicate'?
Red flag Using a set when you later need the associated value (e.g. an index), forcing an awkward rework.
source: AlgoArk — Hash Map Patterns for Interviews ↗ -
Determine whether an array contains any duplicate values.
Walk the array once, inserting each value into a hash set; if a value is already present, return true immediately, otherwise return false at the end. O(n) time, O(n) space. Alternatively sort first and check adjacent equal pairs for O(n log n) time and O(1) extra space — a clean time/space trade-off to mention.
What a strong answer coversHash set: insert each, return true on the first repeat.
O(n) time, O(n) space.
Sort-and-scan alternative: O(n log n) time, O(1) extra space.
Early exit on first duplicate; no need to finish the scan.
Follow-ups they push on- What if duplicates only count when within k indices of each other?
- How would you do it with O(1) extra space?
Red flag Comparing all pairs with a double loop (O(n^2)) when a single hash-set pass is O(n).
source: LeetCode 217 — Contains Duplicate (company tags) ↗ -
Why does iterating a hash map give no guaranteed order, and what should you use if you need ordering?
Entries are placed by hash value into buckets, so iteration order reflects the internal bucket layout — which changes with the hash function, capacity, and resizes — not insertion or sort order. If you need a stable order, use an insertion-ordered map (Java's LinkedHashMap, Python's dict since 3.7) for insertion order, or a tree/sorted map (TreeMap, C++ std::map) for key-sorted order at O(log n) per op. Never rely on a plain hash map's iteration order; it's an implementation detail that can differ across runs or versions.
What a strong answer coversIteration order follows bucket layout, not insertion or sort order.
Order can change after a resize/rehash or across language versions.
Need insertion order ⇒ LinkedHashMap / Python dict.
Need sorted order ⇒ TreeMap / std::map (O(log n) ops).
Quick self-checkYou need keys returned in sorted order on every iteration. Which structure?
-
Wrong — plain hash maps give bucket order, not sorted order.
-
Correct — it maintains keys in sorted order at O(log n) per operation.
-
Wrong — that preserves insertion order, not key-sorted order.
-
Wrong — sets are also unordered; switching set/map doesn't add ordering.
Follow-ups they push on- Python dicts preserve insertion order since 3.7 — is that the same as 'sorted'?
- What ordering does a TreeMap give, and at what cost?
Red flag Depending on a plain hash map's iteration order in tests or logic, then breaking when it changes.
source: AlgoArk — Hash Map Patterns for Interviews ↗