1.3 ★ core [I][J] 15 interview Q's

Hashing structures

Hash maps/sets give average O(1) by key. Understand collisions, load factor/rehash, and when a hashmap is the wrong tool (ordering / range queries).

The hash map is the workhorse of everyday code: it turns “find the value for this key” from an O(n) scan into an average O(1) lookup. The mental model is a coat check — you hand over a coat (the key), a function stamps it with a ticket number (the bucket), and retrieval jumps straight to that slot instead of searching the whole rack. Understanding how it gets that speed also tells you exactly when it betrays you.

How average O(1) happens

A hash map runs the key through a hash function that produces a number, then maps that number to a slot (bucket) in an array, usually with hash(key) % numBuckets. Because the slot is computed directly, insert, lookup, and delete don’t search — they jump straight to the bucket, giving average O(1). A set is the same machinery storing only keys: membership and uniqueness in average O(1), ideal for dedup and “have I seen this before?” checks.

The “average” qualifier is load-bearing. Two distinct keys can hash to the same bucket — a collision — and how the map resolves those, plus how full it gets, determines whether you actually keep O(1).

FIG 1 · buckets and a collision hash(key) % 8 picks a bucket. 'sky' and 'cat' collide on bucket 2 — chaining stores both in a short list.

Collisions and load factor

There are two classic ways to handle collisions. Separate chaining stores each bucket as a small list (or, in modern Java’s HashMap, a tree once a bucket gets long): colliding keys append to that bucket’s list. Open addressing keeps everything in the array itself and, on collision, probes for the next free slot (linear probing, quadratic probing, double hashing). Either way, the worst case is the same: if many keys collide into one bucket, lookups walk that bucket and degrade to O(n).

What keeps buckets short is the load factor — entries divided by buckets. When it crosses a threshold (commonly ~0.75), the table allocates a larger bucket array and rehashes every entry into it. That resize is an occasional O(n) cost, but it’s amortized away, keeping the average O(1). Two things break this: a bad hash function that clusters keys, and adversarial keys deliberately chosen to collide.

Strategy	How it resolves a collision	Strength	Weakness
Separate chaining	append to the bucket's list/tree	simple; tolerates high load factor	extra pointers; cache-unfriendly chains
Open addressing	probe to the next free slot in the array	cache-friendly; no per-entry pointers	clusters; degrades sharply near full

Most language runtimes chain; many high-performance hash maps open-address for cache locality.

Frequency map — the canonical use

Counting occurrences is the textbook hash-map win: one pass, O(1) per key, O(n) total — versus O(n^2) if you re-scanned for each element.

function topWord(words) {
  const counts = new Map();          // key → frequency
  for (const w of words) {
    counts.set(w, (counts.get(w) ?? 0) + 1);  // average O(1) per word
  }
  let best = null, max = 0;
  for (const [w, c] of counts) {
    if (c > max) { max = c; best = w; }
  }
  return best;                        // O(n) total
}

The same shape covers dedup (use a Set), grouping (key → list), and memoizing results by argument — any time you find yourself re-scanning to answer “have I seen this / how many?”.

When a hash map is the wrong tool

A hash map scatters keys across buckets by hash value, which means it has no order. The moment you need ordering or range queries, it’s the wrong structure — reach for a balanced tree (an ordered map, like Java’s TreeMap or C++‘s std::map) instead.

Need	Use	Why
Lookup by exact key	Hash map / set	average `O(1)`, no ordering needed
Dedup / membership	Set	`O(1)` “seen it?” checks
Keys in sorted order	Balanced tree (ordered map)	hash maps have no order
Range queries (keys in [a,b])	Balanced tree	`O(log n)` range scan; hash can't range
Min / max / “next greater”	Balanced tree or heap	ordering the hash map throws away

Hash map for point lookups; tree the moment order or ranges matter.

Hash-flooding: a DoS hiding in your dictionary multiple vendors

In 2003 Crosby and Wallach showed that the average-O(1) guarantee is also an attack surface. If a server uses a deterministic hash function and lets users supply keys — HTTP POST parameters, JSON object fields, HTTP headers — an attacker can craft thousands of keys that all hash to the same bucket. Every insert then walks the full chain, turning each O(1) operation into O(n) and the whole request into O(n²). A few hundred KB of carefully chosen form fields could pin a CPU for minutes.

This wasn’t theoretical: in 2011 the 28C3 “Effective DoS” talk demonstrated it against PHP, Java, Python, Ruby, ASP.NET, and Node.js/v8 simultaneously, triggering an industry-wide round of CVEs. The fix the whole industry adopted: randomized, seeded hash functions (SipHash) so an attacker can’t predict which keys collide. It’s why modern runtimes salt their string hashes with a per-process random seed — and why “never trust user input, even as a map key” is a real security rule.

read the writeup ↗ cs.rice.edu

01 Learning objectives

0 / 5 done

02 Knowledge check

knowledge check3 questions · pass ≥ 70%

01easy
Average-case lookup in a hash map is:
02medium
You need to keep keys in sorted order AND do range queries. A hash map is wrong because…
03medium
Exceeding a hash table's load factor triggers a resize/rehash.

03 Interview questions

browse all ↗

What gets asked on this topic — tap a card for how to approach it, the follow-ups, and the trap. Company tags are best-effort & sourced.

Commonly asked junior concept very common How does a hash map achieve average O(1) lookup, and why is the worst case O(n)?
A hash function maps a key to a bucket index, so with a good hash and a reasonable load factor most buckets hold ~1 entry and lookup is average O(1). The worst case is O(n) when many keys collide into the same bucket (bad hash, adversarial keys, or everything hashing the same), degrading a bucket into a linear scan. The O(1) is therefore an expected/average bound, not a guarantee.
Follow-ups they push on
- What makes a hash function 'good'?
- How can an attacker force the worst case (hash flooding)?
Red flag Stating O(1) as a hard worst-case guarantee instead of an average/expected one.
source: Hirist — Top HashMap Interview Questions ↗
Commonly asked mid concept common Compare separate chaining and open addressing for collision handling.
Separate chaining stores colliding keys in a per-bucket list (or tree, as Java 8+ does past a threshold), so it tolerates high load factors but pays pointer/indirection overhead. Open addressing keeps everything in the array and probes for the next free slot (linear/quadratic probing, double hashing); it's cache-friendlier but degrades sharply as load factor approaches 1 and complicates deletion. The choice trades memory locality against sensitivity to load factor.
Follow-ups they push on
- Why does deletion need tombstones in open addressing?
- Why does Java convert long chains into trees?
Red flag Describing chaining and open addressing as interchangeable without noting their load-factor and deletion behaviour differs.
source: GetSDEReady — HashMap & HashSet Interview Questions ↗
Commonly asked mid concept common What is a load factor, and what happens when it's exceeded?
Load factor is entries divided by buckets — a measure of how full the table is (Java's HashMap defaults to 0.75). When it's exceeded the table resizes: capacity roughly doubles and every key is rehashed into the larger array, an O(n) operation that happens rarely, keeping amortized insert O(1). A higher load factor saves memory but raises collision rates and slows lookups; a lower one wastes space.
Follow-ups they push on
- Why double the capacity rather than grow by a constant?
Red flag Thinking each insert that crosses the threshold is cheap, or that resize never happens.
source: Hirist — Top HashMap Interview Questions ↗
AmazonGoogleMetaAppleMicrosoftBloomberg junior coding very common Given an array and a target, return indices of two numbers that sum to the target.
Walk the array once, and for each value x check a hash map for target - x; if present you've found the pair, otherwise store x -> index and continue. This is O(n) time, O(n) space — the canonical 'use a hash map to remember what you've seen' problem. The brute-force double loop is O(n^2); the hash map trades space for that speedup.
Follow-ups they push on
- What changes if the array is already sorted?
- How would you return all unique pairs (3Sum-style)?
Red flag Matching an element with itself by checking the map before inserting the current element incorrectly.
source: LeetCode 1 — Two Sum (company tags) ↗
AmazonMetaGoogleUber mid coding common Group a list of strings into anagrams.
Use a hash map keyed by a canonical form of each word and collect words sharing a key. The canonical key is either the sorted characters (O(k log k) per word) or a 26-length character-count signature (O(k) per word); the latter is faster for long strings. Total time is about O(n*k), space O(n*k). The trick the interviewer is probing is choosing a good collision-free key.
Follow-ups they push on
- Which key is better when words are long, and why?
Red flag Comparing every pair of words for the anagram relation (O(n^2 * k)) instead of bucketing by a canonical key.
source: LeetCode 49 — Group Anagrams (company tags) ↗
MetaAmazonGoogle mid coding common Count the number of contiguous subarrays whose sum equals k.
Track a running prefix sum and a hash map of how many times each prefix sum has occurred; at each index, the count of subarrays ending here equals the number of earlier prefix sums equal to prefixSum - k. Seed the map with {0: 1} to count subarrays starting at index 0. This is O(n) time, O(n) space, versus the O(n^2) brute force.
Follow-ups they push on
- Why must the map be seeded with prefix sum 0?
Red flag Forgetting the `{0:1}` seed, which drops every subarray that starts at index 0.
source: LeetCode 560 — Subarray Sum Equals K (company tags) ↗
Commonly asked mid concept common When is a hash map the wrong data structure? What do you reach for instead?
A hash map gives no ordering, so it's wrong when you need sorted iteration, the min/max, or range queries ('all keys between A and B'). For those, use an ordered/tree-based map (red-black tree, like Java's TreeMap or C++ std::map) giving O(log n) ordered operations, or a heap when you only need the extreme. Hash maps shine for pure key lookup, dedup, and frequency counting.
Follow-ups they push on
- What does a TreeMap give you that a HashMap can't?
Red flag Defaulting to a hash map for problems that need ordering or range scans and then bolting on a sort every query.
source: AlgoArk — Hash Map Patterns for Interviews ↗
MetaAmazonGoogle senior coding occasional Design a structure with insert, delete, and getRandom all in average O(1).
Combine a dynamic array (for O(1) random access by index) with a hash map from value to its index in the array. Insert appends and records the index; delete swaps the target with the last element, pops the tail, and fixes the moved element's index; getRandom picks a random array index. The swap-with-last trick is what keeps delete O(1) instead of O(n).
Follow-ups they push on
- How do you support duplicate values?
Red flag Deleting by shifting the array (O(n)) instead of swapping the victim with the last element.
source: LeetCode 380 — Insert Delete GetRandom O(1) (company tags) ↗
AmazonBloombergAppleMicrosoftGoogle junior coding common Find the first non-repeating character in a string and return its index.
Make one pass to build a hash map (or 26-length array) of character counts, then a second pass over the string returning the index of the first character whose count is 1; return -1 if none. Two linear passes, O(n) time, O(1) space if the alphabet is fixed (at most 26/128 entries). The second pass must walk the original string order, not the map, because a map has no positional order.
What a strong answer covers
- Pass 1: count frequencies in a map/array.
- Pass 2: scan the string in order, return first index with count 1.
- O(n) time; O(1) space for a fixed alphabet.
- Iterate the string (ordered), not the map (unordered), in pass 2.
Follow-ups they push on
- Why can't you find the answer by iterating the hash map directly?
- How would you support a streaming version where characters arrive over time?
Red flag Iterating the map instead of the string in pass 2 and returning a non-first unique character because maps lack order.
source: LeetCode 387 — First Unique Character in a String (company tags) ↗
★ must-know AmazonGoogleMetaBloomberg mid coding common Find the length of the longest consecutive sequence of integers in an unsorted array, in O(n).
Put every number in a hash set for O(1) membership, then for each number x start counting a run only if x - 1 is absent (so x is a sequence start); from such a start, extend x, x+1, x+2, … while present and track the longest. Starting only at run-beginnings means each number is visited O(1) times overall, giving O(n) time, O(n) space — beating the O(n log n) sort-then-scan.
What a strong answer covers
- Set membership gives O(1) 'is this number present?' checks.
- Only begin counting where x-1 is missing (a run start).
- That guard bounds total work to O(n), not O(n²).
- Beats sorting (O(n log n)) by trading time for O(n) space.
Quick self-check
Why is the algorithm O(n) and not O(n²) despite the inner while-loop?
Follow-ups they push on
- Why does the 'only start where x-1 is absent' check keep it O(n)?
- What if duplicates are present in the input?
Red flag Extending a run from every element (O(n^2)) instead of only from numbers that begin a run.
source: LeetCode 128 — Longest Consecutive Sequence (company tags) ↗
Commonly asked senior concept occasional What makes a good hash function, and what is 'hash flooding' (algorithmic complexity attack)?
A good hash function distributes keys uniformly across buckets, is fast to compute, and is deterministic — minimizing collisions so buckets stay ~O(1). Hash flooding is a denial-of-service attack where an adversary crafts many keys that all hash to the same bucket, collapsing every lookup/insert to O(n) and the whole table to O(n^2) work — historically used to DoS web servers via crafted POST/query parameters. Defenses include per-process randomized/seeded hashing (SipHash) so an attacker can't predict the bucket, and converting long collision chains into balanced trees (Java 8+ does this).
What a strong answer covers
- Good hash: uniform, fast, deterministic ⇒ low collision rate.
- Hash flooding forces worst-case collisions ⇒ O(n) ops, O(n²) total (DoS).
- Defense 1: seeded/randomized hashing (e.g. SipHash) hides the mapping.
- Defense 2: treeify long chains (O(n) → O(log n) within a bucket).
Follow-ups they push on
- Why does a per-process random seed defeat the attack?
- How does treeifying long buckets bound the worst case?
Red flag Assuming worst-case collisions only happen by chance and ignoring that they can be deliberately induced.
source: GeeksforGeeks — Hash Functions and Hashing ↗
★ must-know Commonly asked mid concept common Why must objects used as hash-map keys be effectively immutable, and what is the equals/hashCode contract?
A hash map places a key in a bucket derived from its hash; if you mutate a key after insertion so its hash changes, the entry is now in the 'wrong' bucket and lookups silently fail to find it. The equals/hashCode contract is the rule that ties them together: if two objects are equal they must have the same hash code, and equal objects must stay equal — so keys should be immutable (or at least their hash-relevant fields must be). Override hashCode whenever you override equals, or hash-based collections break.
What a strong answer covers
- Bucket is chosen from the key's hash at insert time.
- Mutating a key's hash-relevant fields strands the entry in the wrong bucket.
- Contract: equal objects ⇒ equal hash codes (not vice-versa).
- Override equals ⇒ you must override hashCode too.
Quick self-check
You override `equals` to compare two fields but leave the default `hashCode`. What breaks?
Follow-ups they push on
- What goes wrong if hashCode is constant for all keys?
- Why is using a mutable list as a key dangerous?
Red flag Overriding `equals` but not `hashCode` (or mutating a key in place), so lookups for present keys return nothing.
source: Oracle Java SE — Object.hashCode() contract ↗
Commonly asked junior concept occasional When should you use a hash set vs a hash map?
A hash set stores keys only and answers 'have I seen this?' — use it for membership, dedup, and presence checks. A hash map stores key→value associations — use it when you also need data attached to each key (counts, indices, last-seen position). Both give average O(1) ops; a set is essentially a map whose values you don't care about. Reach for the map the moment you need to remember *something about* each key, not just *that* you saw it.
What a strong answer covers
- Set: membership / dedup / 'seen?' — keys only.
- Map: key → value — counts, indices, metadata per key.
- Both average O(1); a set is a valueless map.
- Two Sum needs a map (value→index); 'contains duplicate' needs only a set.
Quick self-check
You must return the index of a matching earlier element. Set or map?
Follow-ups they push on
- Which would you use for Two Sum, and why not the other?
- Which for 'does this array contain any duplicate'?
Red flag Using a set when you later need the associated value (e.g. an index), forcing an awkward rework.
source: AlgoArk — Hash Map Patterns for Interviews ↗
AmazonAppleGoogle junior coding common Determine whether an array contains any duplicate values.
Walk the array once, inserting each value into a hash set; if a value is already present, return true immediately, otherwise return false at the end. O(n) time, O(n) space. Alternatively sort first and check adjacent equal pairs for O(n log n) time and O(1) extra space — a clean time/space trade-off to mention.
What a strong answer covers
- Hash set: insert each, return true on the first repeat.
- O(n) time, O(n) space.
- Sort-and-scan alternative: O(n log n) time, O(1) extra space.
- Early exit on first duplicate; no need to finish the scan.
Follow-ups they push on
- What if duplicates only count when within k indices of each other?
- How would you do it with O(1) extra space?
Red flag Comparing all pairs with a double loop (O(n^2)) when a single hash-set pass is O(n).
source: LeetCode 217 — Contains Duplicate (company tags) ↗
Commonly asked mid concept occasional Why does iterating a hash map give no guaranteed order, and what should you use if you need ordering?
Entries are placed by hash value into buckets, so iteration order reflects the internal bucket layout — which changes with the hash function, capacity, and resizes — not insertion or sort order. If you need a stable order, use an insertion-ordered map (Java's LinkedHashMap, Python's dict since 3.7) for insertion order, or a tree/sorted map (TreeMap, C++ std::map) for key-sorted order at O(log n) per op. Never rely on a plain hash map's iteration order; it's an implementation detail that can differ across runs or versions.
What a strong answer covers
- Iteration order follows bucket layout, not insertion or sort order.
- Order can change after a resize/rehash or across language versions.
- Need insertion order ⇒ LinkedHashMap / Python dict.
- Need sorted order ⇒ TreeMap / std::map (O(log n) ops).
Quick self-check
You need keys returned in sorted order on every iteration. Which structure?
Follow-ups they push on
- Python dicts preserve insertion order since 3.7 — is that the same as 'sorted'?
- What ordering does a TreeMap give, and at what cost?
Red flag Depending on a plain hash map's iteration order in tests or logic, then breaking when it changes.
source: AlgoArk — Hash Map Patterns for Interviews ↗