COMPUTER SCIENCE FUNDAMENTALS SERIES

Hash Tables

Hash functions · Collision resolution · Open addressing ·
Cuckoo hashing · Load factor · Concurrent maps

Mid-level software engineer track · 20 slides

Dictionary / Map ADT

The abstract interface

A dictionary (also called map, associative array) supports three core operations on key-value pairs:

INSERT(k, v) — associate value v with key k
SEARCH(k) — return the value for key k, or report absent
DELETE(k) — remove the pair with key k

Hash tables achieve amortised constant time by trading deterministic guarantees for expected-case performance. The key insight: convert the key to an array index via a hash function.

Naive implementations

Data Structure	INSERT	SEARCH	DELETE
Unsorted array	`O(1)`	`O(n)`	`O(n)`
Sorted array	`O(n)`	`O(log n)`	`O(n)`
Balanced BST	`O(log n)`	`O(log n)`	`O(log n)`
Hash table	O(1) exp.	O(1) exp.	O(1) exp.

Direct Addressing

When keys are small integers

If every possible key is in {0, 1, ..., U-1} and U is small, store values in a plain array T[0..U-1].

T[k] = v       // INSERT
T[k]           // SEARCH
T[k] = NIL     // DELETE

All operations are O(1) worst case.

Direct addressing is the conceptual foundation. A hash table generalises it by compressing the key space via a hash function.

The problem

When U is large (e.g. 64-bit integers, strings), allocating an array of size U is infeasible
When only n << U keys are used, most of the array is wasted
Solution: hash the key into a smaller index range [0, m-1] where m = O(n)

Hash Functions

Requirements

A hash function h(k) maps a key from universe U to an index in {0, 1, ..., m-1}.

Deterministic — same key always produces the same index
Uniform distribution — keys spread evenly across slots
Fast to compute — ideally O(1) or O(|key|)

Division method

h(k) = k mod m

Simple but sensitive to choice of m. Avoid powers of 2. Choose m as a prime not close to a power of 2.

Multiplication method

h(k) = floor(m * (k * A mod 1))
// where 0 < A < 1
// Knuth: A = (sqrt(5) - 1) / 2 ≈ 0.618

Practical hash functions

Function	Use case
MurmurHash3	General-purpose, fast, good distribution
xxHash	Extremely fast, widely used in checksums
SipHash	Hash-flooding resistant — default in Python, Rust
FNV-1a	Simple, byte-at-a-time, good for short keys
CityHash / FarmHash	Google's fast hashes for strings

Cryptographic hashes (SHA-256) provide stronger guarantees but are much slower. Use them for security, not hash tables.

Universal Hashing

The adversary problem

Any single deterministic hash function has a worst case: an adversary can craft n keys that all hash to the same slot, degrading every operation to O(n).

Universal hash families

A family H is universal if for any two distinct keys x ≠ y:

Pr[h(x) = h(y)] ≤ 1/m
// where h chosen randomly from H

Carter-Wegman construction

For prime p ≥ |U|:

h_{a,b}(k) = ((a * k + b) mod p) mod m
// a ∈ {1, ..., p-1},  b ∈ {0, ..., p-1}

Properties

Expected chain length is n/m regardless of key distribution

No adversary can force worst-case behaviour without knowing a, b

Choosing a new random h at construction time is sufficient — no per-query randomness needed

Stronger families

k-independent — any k distinct keys hash independently
2-independent ≈ universal; 5-independent suffices for linear probing to achieve O(1) expected
Tabulation hashing — 3-independent, fast, practical; suffices for chaining and cuckoo hashing

Separate Chaining

Collision resolution with linked lists

Each slot T[i] points to a linked list of all elements whose keys hash to i.

Operations

INSERT — compute h(k), prepend to list at T[h(k)] — O(1)
SEARCH — compute h(k), traverse list at T[h(k)] — O(chain length)
DELETE — search then unlink — O(chain length)

Performance

With load factor α = n/m and a good hash function:

Expected chain length = α
Expected search time = O(1 + α)
If n = O(m), all operations are O(1) expected

Chaining is simple and tolerates α > 1 gracefully. Downside: pointer chasing is cache-unfriendly.

Open Addressing — Linear & Quadratic Probing

Open addressing principle

All elements stored directly in the table array. No pointers, no linked lists. On collision, probe a sequence of alternative slots until an empty one is found.

Linear probing

h(k, i) = (h'(k) + i) mod m
// for i = 0, 1, 2, ...

Slots checked sequentially. Cache-friendly but suffers from primary clustering — runs of occupied slots grow and merge.

Quadratic probing

h(k, i) = (h'(k) + c1*i + c2*i²) mod m

Spreads probes more than linear. Avoids primary clustering but can cause secondary clustering.

Comparison

Method	Cache	Clustering	Coverage
Linear	Excellent	Primary	All slots
Quadratic	Good	Secondary	May miss if m not prime

Linear probing is used in practice more than theory suggests — modern CPUs love sequential memory access. Swiss Table proves this decisively.

Primary clustering — occupied runs grow:

↑ cluster of 5 — new inserts extend it further

Double Hashing

Two independent hash functions

h(k, i) = (h1(k) + i * h2(k)) mod m
// h2(k) must never return 0
// Common: h2(k) = 1 + (k mod (m - 1)), m prime

Why it works

Each key generates a distinct probe sequence — no clustering
Approximates the ideal of uniform probing
Expected probes for unsuccessful search: 1 / (1 - α)
Expected probes for successful search: (1/α) * ln(1 / (1 - α))

As α → 1, performance degrades sharply. Open-addressing tables should resize well before reaching α = 1 — typically at α ∈ [0.5, 0.75].

Probe count at various load factors

Load α	Unsuccessful	Successful
0.50	2.0 probes	1.4 probes
0.75	4.0 probes	1.8 probes
0.90	10.0 probes	2.6 probes
0.95	20.0 probes	3.2 probes

Probing comparison

Linear — fast cache, primary clustering

Quadratic — better spread, secondary clustering

Double — no clustering, two hash computations

Load Factor & Performance

Load factor

α = n / m
// n = stored elements, m = table capacity

Chaining: α can exceed 1.0. Performance degrades linearly: O(1 + α)
Open addressing: α must stay below 1.0. Performance degrades non-linearly

Amortised O(1) argument

Maintain α ≤ c for constant c < 1 by doubling when threshold exceeded:

Each insertion is O(1) expected
Resizing costs O(n) but happens only every O(n) insertions
Amortised cost per insertion: O(1)

Space-time trade-off

Load target	Memory overhead	Probe perf.
`α ≤ 0.50`	2x actual data	Excellent
`α ≤ 0.75`	1.33x actual data	Good
`α ≤ 0.90`	1.11x actual data	Acceptable / poor

Most real-world hash tables target α ∈ [0.5, 0.75]. Python's dict resizes at α = 2/3.

Expected longest chain (chaining)

For n keys in m slots with universal hashing, the expected longest chain is Θ(log n / log log n).

Resizing & Rehashing

When to resize

Typically when α exceeds a threshold (e.g. 0.75). Some implementations also shrink when α drops below a lower bound (e.g. 0.25).

The rehashing process

Allocate new table of size m' = 2m (or next prime > 2m)
For every element, compute h(k) mod m' and insert into the new table
Free the old table

Cost: O(n) — must rehash every element because the modulus changed.

Incremental rehashing (Redis approach)

Maintain two tables simultaneously
On each operation, migrate a few keys from old to new
Lookups check both tables during the transition
After all keys migrated, free the old table

Growth strategies

Strategy	New size	Amortised
Doubling	`2m`	`O(1)`
1.5x growth	`1.5m`	`O(1)` (lower const.)
Prime stepping	next prime > `2m`	`O(1)`, better distribution

Shrinking is often omitted in practice — the memory cost of an oversized table is usually acceptable, and shrink-grow oscillation near the threshold is expensive.

Old table (m) → Rehash all n keys → New table (2m)

Full rehash: one O(n) pause

Old ⇄ Migrate k keys/op → New

Incremental: no latency spike

Robin Hood Hashing

Equalising probe distances

A variant of open-addressing (typically linear probing) where the probe distance of each element is tracked.

The Robin Hood rule

During insertion, if the new element's probe distance exceeds the occupant's, swap them — "steal from the rich, give to the poor."

Insert key K with probe distance d:
  While slot is occupied:
    If d > occupant's probe distance:
      Swap K with occupant
      Continue inserting displaced element
    Advance to next slot, d++
  Place K in empty slot

Benefits

Reduced variance — all elements have similar probe lengths

Expected max probe — O(log log n) vs O(log n) for standard linear probing

Early termination — lookup stops if current probe distance exceeds occupant's

Simpler deletion — backward-shift instead of tombstones

Robin Hood hashing makes linear probing practical at higher load factors. Rust's HashMap used Robin Hood hashing before switching to hashbrown (Swiss Table).

Cuckoo Hashing

Worst-case O(1) lookup

Uses two hash functions h1 and h2 and two tables T1, T2.

SEARCH: check T1[h1(k)] and T2[h2(k)] — exactly 2 lookups, always
DELETE: check both locations, remove — O(1) worst case

Insertion algorithm

Insert key k:
  If T1[h1(k)] empty -> place k, done
  Evict x from T1[h1(k)], place k
  If T2[h2(x)] empty -> place x, done
  Evict y from T2[h2(x)], place x
  ... repeat (max iteration limit)
  If cycle -> rehash with new functions

Properties

Property	Value
Lookup	O(1) worst case
Deletion	O(1) worst case
Insertion	O(1) amortised (with rehash)
Max load factor	~50% (two tables) or higher with buckets
Rehash trigger	Insertion cycle detected

Cuckoo hashing guarantees constant-time lookups — ideal for network routers, hardware implementations, and read-heavy workloads.

⇄

Hopscotch Hashing

Bounded neighbourhood

Each slot has a "neighbourhood" of H consecutive slots (typically H = 32). An element hashing to slot i must reside within i to i + H - 1.

Lookup

Check all H slots in the neighbourhood — a single cache line or two. Uses a bitmap per slot to record which neighbours are occupied by elements hashing to that slot.

Insertion

Find an empty slot (may be beyond the neighbourhood)
Repeatedly swap elements to move the empty slot closer, into the neighbourhood
If no swap chain can bring the empty slot within range, resize

Advantages over linear probing

Bounded worst-case probe — at most H slots

Cache-friendly — neighbourhood fits in 1–2 cache lines

Concurrency-friendly — lock only the neighbourhood, not the whole table

No tombstones — just clear the bitmap bit and the slot

Hopscotch hashing combines the cache-friendliness of linear probing with a bounded probe guarantee. Used in Cliff Click's lock-free concurrent hash map.

Neighbourhood of slot 3 (H=4):

↑ bitmap: 1101 — elements at slots 3, 4, 6

Swiss Table

Google's flat_hash_map

Open-addressing with linear probing, but with SIMD-accelerated metadata lookups. Used in Abseil (absl::flat_hash_map) and Rust's hashbrown.

Architecture

Table divided into groups of 16 slots
Each group has a 16-byte control byte array (one byte per slot)
Control byte: top 7 bits of hash (H2) + 1 bit for empty/deleted/full
Probe sequence walks over groups, not individual slots

SIMD-powered lookup

1. h1 = hash(key)
2. group = h1 mod num_groups
3. H2 = top 7 bits of hash
4. Load 16 ctrl bytes → SIMD register
5. Compare all 16 vs H2 in ONE instr
   (SSE2 _mm_cmpeq_epi8)
6. Bitmask → check only matches

Performance characteristics

Property	Value
Probe unit	16 slots at once (SIMD)
Empty check	Single SIMD comparison
Load factor	Up to ~87.5% efficiently
Deletion	"deleted" sentinel — no tombstone clustering
Memory layout	Keys/values stored flat, no pointers

Swiss Table is the state of the art for general-purpose hash tables. It proves that linear probing + SIMD beats more complex schemes in practice.

One group (16 control bytes):

H2 match · FF=empty · FE=deleted

Perfect Hashing

Zero collisions by construction

A perfect hash function maps n keys to m slots with no collisions. A minimal perfect hash uses m = n.

FKS two-level scheme

Level 1: Hash n keys into m primary slots using a universal hash function.

Level 2: For each primary slot j with n_j keys, build a secondary table of size n_j² with its own universal hash function.

Level 1:  h1(k) → primary slot j
Level 2:  h_j(k) → position in
          secondary table for slot j

Space and time

Property	Value
Lookup	O(1) worst case — 2 hashes, 2 lookups
Total space	O(n) expected
Construction	O(n) expected
Static only	Keys must be known in advance

Practical tools

gperf CMPH BBHash CHD

Perfect hashing is optimal for static dictionaries — compiler keyword tables, routing tables with fixed prefix sets. For dynamic sets, cuckoo hashing is the closest analogue.

Hash Tables in Practice — Python & Java

Python dict (CPython 3.6+)

Open addressing with compact layout
Separate hash array (indices) and dense entries array (preserves insertion order)
Hash function: SipHash (hash-flooding resistant)
Resize at α = 2/3; growth ~3x small, ~2x large
Compact dict saves ~25% memory vs pre-3.6

Java HashMap

Separate chaining with linked list per bucket
At α = 0.75, resize to 2 * capacity
Chain > 8 elements → treeified (red-black tree, O(log n) per bucket)
Chain < 6 → reverts to linked list
Hash: (h = key.hashCode()) ^ (h >>> 16)

Comparison

Feature	Python dict	Java HashMap
Strategy	Open addressing	Separate chaining
Order preserved	Yes (insertion order)	No (use LinkedHashMap)
Treeification	No	Yes (chains > 8)
Hash function	SipHash	hashCode() XOR-shifted
Default load factor	0.67	0.75

Hash Tables in Practice — Go & C++

Go map

Separate chaining with buckets — each bucket holds 8 key-value pairs
Overflow buckets linked when full
Incremental rehashing — grows and evacuates lazily
Hash: architecture-dependent (AES on amd64)
Not thread-safe — concurrent read/write is a fatal panic
Iteration order randomised by design

C++ std::unordered_map

Separate chaining with linked list per bucket (mandated by iterator stability rules)
Default max_load_factor = 1.0
Poor cache locality — pointer-based chains
Widely considered a performance anti-pattern

Why std::unordered_map is slow

Issue	Cause
Pointer chasing	Each node is a separate heap allocation
Cache misses	Nodes scattered across memory
High load factor	Default 1.0 means long chains
Iterator stability	Standard requires refs stay valid — prevents open addressing

For performance-critical C++ code, use absl::flat_hash_map (Swiss Table), robin_map (Tessil), or ankerl::unordered_dense.

Thread-Safe Hash Maps

The problem

Concurrent reads and writes require synchronisation. A global lock serialises all operations and destroys throughput.

Lock striping (Java ConcurrentHashMap)

Partition the table into segments (e.g. 16)
Each segment has its own lock
Operations on different segments proceed in parallel
Java 8+: replaced segments with per-bucket CAS + synchronized on bucket head

Lock-free approaches

RCU — readers lock-free; writers copy, modify, swap pointer
Harris-style — CAS-based insertion/deletion on chains
Cliff Click's NonBlockingHashMap — lock-free open addressing via CAS

Comparison

Implementation	Read	Write	Resize
Global lock	Serial	Serial	Blocking
Lock striping	Parallel	Parallel	Per-segment
CAS per bucket	Lock-free	Lock-free	Cooperative
RCU	Wait-free	Lock write	Copy-on-write

For read-heavy workloads, RCU or lock-free maps offer the best throughput. For mixed workloads, CAS-based per-bucket locking (Java 8 ConcurrentHashMap) is the industry standard.

Applications & Limitations

Applications

Symbol tables — compilers, interpreters, debuggers

Caching — memoisation, LRU caches (hash map + doubly-linked list)

Databases — hash indexes, hash joins, in-memory tables

Networking — connection tracking, NAT tables, routing tables

Deduplication — seen-set for crawlers, stream processing

Counting — frequency tables, word counts, histograms

Sets — membership testing (hash set = hash map with no values)

Limitations

No ordering — no range queries, min/max, successor (use BST / B-tree)

Hash quality matters — poor functions cause clustering and O(n)

Hash-flooding attacks — adversarial inputs force worst case (mitigate with SipHash / random seeds)

Memory overhead — open addressing wastes empty slots; chaining wastes pointers

Resizing latency — rehashing n elements pauses the system (mitigate with incremental rehash)

Not scan-optimal — iterating all elements slower than a flat array

When you need sorted iteration or range queries, use a balanced BST (std::map, TreeMap) or B-tree.

Summary & Further Reading

Key takeaways

Hash tables provide O(1) expected time for insert, search, delete — the fastest dictionary for point queries
Hash function choice determines distribution quality and security against adversarial inputs
Separate chaining is simple but cache-unfriendly; open addressing keeps data contiguous
Robin Hood hashing equalises probe distances; cuckoo hashing guarantees O(1) worst-case lookup
Swiss Table (SIMD + linear probing) is the current state of the art
Perfect hashing achieves O(1) worst case for static key sets
Load factor management and amortised resizing keep operations constant-time
Thread safety requires lock striping, CAS, or RCU — never a single global lock

Source	Description
CLRS	Introduction to Algorithms, chapters 11–12
Knuth	TAOCP Vol. 3 — Sorting & Searching, Section 6.4
Herlihy & Shavit	The Art of Multiprocessor Programming
Abseil	Swiss Table design notes
Pagh & Rodler	"Cuckoo Hashing" — J. Algorithms, 2004
Celis et al.	"Robin Hood Hashing" — FOCS 1985