A collection of non-overlapping (disjoint) sets. Every element belongs to exactly one set at any time. Each set has a designated representative element.
Given universe U = {x₁, x₂, ..., xₙ}:
S₁, S₂, ..., Sₖ such that Sᵢ ∩ Sⱼ = ∅ for i ≠ jS₁ ∪ S₂ ∪ ... ∪ Sₖ = UThe disjoint set data structure supports a dynamic partition that starts with singletons and merges sets over time — but never splits them.
| Operation | Description | Effect |
|---|---|---|
| Make-Set(x) | Create a new set containing only x | x becomes its own representative |
| Find(x) | Return the representative of the set containing x | Read-only (with optimisations) |
| Union(x, y) | Merge the sets containing x and y | One representative chosen for the merged set |
Make-Set(0), Make-Set(1), ..., Make-Set(n-1)
Union(0, 1)
Union(2, 3)
Find(1) == Find(0) # true -- same set
Find(1) == Find(2) # false -- different sets
Union(1, 3)
Find(0) == Find(2) # true -- now merged
Find answers the connectivity question: are two elements in the same set? Union establishes new connections. Neither operation ever separates elements.
Each set is a linked list. Every node stores a pointer to the head (representative). Union appends one list to another and updates all head pointers in the shorter list.
| Operation | Cost |
|---|---|
| Make-Set | O(1) |
| Find | O(1) — follow pointer to head |
| Union | O(n) worst case |
A sequence of n - 1 Union operations on n singletons can cost O(n²) total if we always append the longer list to the shorter.
Using the weighted-union heuristic (always append shorter to longer), the total cost drops to O(m + n log n).
The linked-list approach is simple but the O(n) Union cost motivates the far superior forest representation.
Represent each set as a rooted tree. Each element stores a single parent pointer. The root points to itself and is the representative.
def make_set(x):
parent[x] = x
def find(x):
while parent[x] != x:
x = parent[x]
return x
def union(x, y):
rx, ry = find(x), find(y)
if rx != ry:
parent[ry] = rx
parent[]: 0 0 0 3 3
index: 0 1 2 3 4
Tree 1: 0 Tree 2: 3
/ \ |
1 2 4
| Operation | Cost |
|---|---|
| Make-Set | O(1) |
| Find | O(n) worst case |
| Union | O(n) worst case |
Without heuristics, n Unions can build a degenerate chain of depth n - 1. Two optimisations fix this.
Maintain a rank for each root — an upper bound on the height of its subtree. When merging, attach the shorter tree under the taller one.
def make_set(x):
parent[x] = x
rank[x] = 0
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry:
return
if rank[rx] < rank[ry]:
rx, ry = ry, rx
parent[ry] = rx
if rank[rx] == rank[ry]:
rank[rx] += 1
r have at least 2ʳ nodes⌊log₂ n⌋Rank only increases when two trees of equal rank merge. A node's rank never changes once it becomes a non-root.
Union by rank alone reduces worst-case Find from O(n) to O(log n). Combined with path compression it approaches O(1).
Store the actual count of nodes in each root's subtree. Always attach the smaller tree under the larger root.
def make_set(x):
parent[x] = x
size[x] = 1
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry:
return
if size[rx] < size[ry]:
rx, ry = ry, rx
parent[ry] = rx
size[rx] += size[ry]
| Property | By rank | By size |
|---|---|---|
| Storage | Small int | Up to n |
| Height bound | ⌊log₂ n⌋ | ⌊log₂ n⌋ |
| Practical speed | Slightly faster | Slightly more memory |
| Path compression | Rank becomes stale | Size remains exact |
Both achieve the same asymptotic bound. Union by rank is more common in textbooks; union by size is useful when you need the actual set cardinality.
During Find(x), make every node on the path from x to the root point directly to the root. Flattens the tree on every query.
def find(x):
if parent[x] != x:
parent[x] = find(parent[x])
return parent[x]
Path compression alone (without union by rank) gives amortised O(log n). Combined with union by rank: O(α(n)) amortised.
Before: After:
0 0
| / | \
1 1 2 4
| |
2 3
|
3
|
4
Path compression is a retrospective optimisation — it does not prevent tall trees from forming, but flattens them after traversal.
Every node on the find path points to its grandparent. Single-pass, iterative.
def find(x):
while parent[x] != x:
next = parent[x]
parent[x] = parent[next]
x = next
return x
Every other node on the find path points to its grandparent.
def find(x):
while parent[x] != x:
parent[x] = parent[parent[x]]
x = parent[x]
return x
| Technique | Nodes updated | Amortised | Stack |
|---|---|---|---|
| Full compression | All on path | O(α(n)) | O(log n) |
| Path splitting | All on path | O(α(n)) | O(1) |
| Path halving | ~Half | O(α(n)) | O(1) |
All three achieve the same amortised bound with union by rank. Path splitting and halving are preferred in practice — iterative and stack-free.
For m operations on n elements:
Total: O(m · α(n))
| n | α(n) |
|---|---|
| 1 | 0 |
| 2 | 1 |
| 16 | 2 |
| 65 536 | 3 |
| 265536 | 4 |
For any conceivable input size, α(n) ≤ 4. Union-find operations are effectively constant time in practice.
A rapidly growing function defined recursively:
A(0, j) = j + 1A(i, 0) = A(i - 1, 1) for i > 0A(i, j) = A(i - 1, A(i, j - 1)) for i, j > 0α(n) = min { k ≥ 0 : A(k, 0) ≥ n }
Since A grows astronomically fast, α grows unimaginably slowly.
α(n)Fredman & Saks (1989) proved Ω(m · α(n)) is a lower bound in the cell-probe model. Union by rank + path compression is asymptotically optimal.
A simplified union-find using union by size with a flat array. Popular in algorithms courses (Princeton, Coursera).
public class WeightedQuickUnionUF {
private int[] parent;
private int[] sz;
private int count;
public WeightedQuickUnionUF(int n) {
parent = new int[n];
sz = new int[n];
count = n;
for (int i = 0; i < n; i++) {
parent[i] = i; sz[i] = 1;
}
}
public int find(int p) {
while (p != parent[p]) p = parent[p];
return p;
}
public void union(int p, int q) {
int rP = find(p), rQ = find(q);
if (rP == rQ) return;
if (sz[rP] < sz[rQ]) {
parent[rP] = rQ; sz[rQ] += sz[rP];
} else {
parent[rQ] = rP; sz[rP] += sz[rQ];
}
count--;
}
}
| Operation | Cost |
|---|---|
| Find | O(log n) |
| Union | O(log n) |
| + path compression | O(α(n)) amortised |
Weighted quick-union is the go-to teaching implementation. Simple, efficient, and easy to extend with path compression.
Standard union-find is destructive — path compression and union modify the structure in place. Sometimes we need to query past versions.
Each node stores a list of (timestamp, parent) pairs. Space: O(n + m), Find: O(log n · log m).
Copy the path from changed node to root on each modification. Space: O(m log n).
Sleator-Tarjan persistent arrays backing parent/rank. O(log n) amortised per version.
A practical persistent union-find using a semi-persistent array (backtrackable). Used in SMT solvers and proof assistants (OCaml).
O(α(n)) operationsO(log n) per access with rerootingPersistence sacrifices the O(α(n)) amortised bound but enables version queries — essential in functional programming and backtracking search.
Support Union and Find as usual, plus Rollback to undo the last Union. Key constraint: do not use path compression (it makes rollback impossible without persistence).
stack = []
def union(x, y):
rx, ry = find(x), find(y)
if rx == ry: return
if rank[rx] < rank[ry]:
rx, ry = ry, rx
stack.append((ry, parent[ry], rank[rx]))
parent[ry] = rx
if rank[rx] == rank[ry]:
rank[rx] += 1
def rollback():
ry, old_parent, old_rank = stack.pop()
rx = parent[ry]
rank[rx] = old_rank
parent[ry] = old_parent
When all queries are known in advance, process them in an optimal order (e.g., divide-and-conquer on a timeline). Used in competitive programming for "dynamic connectivity" problems.
Rollback union-find is O(log n) per operation (no path compression). You gain undo capability at the cost of the α(n) speedup.
Divide the timeline into segments. At each node of the segment tree, apply the unions active during that segment, recurse, then rollback.
(u, v, w) in sorted order:Find(u) ≠ Find(v), add edge to MST and Union(u, v)n - 1 edges| Step | Cost |
|---|---|
| Sort edges | O(E log E) |
n Make-Set | O(n) |
2E Find + n-1 Union | O(E · α(n)) |
| Total | O(E log E) |
Without union-find, Kruskal's would need O(V) connectivity checks per edge, giving O(VE) total. Union-find makes it O(E log E).
for v in vertices:
make_set(v)
for (u, v) in edges:
union(u, v)
# Connected iff find(u) == find(v)
Total: O(V + E · α(V)) — effectively linear.
As edges arrive online, union-find maintains connected components without recomputation. Each new edge triggers at most one Union.
Track a counter initialised to V. Decrement on each successful Union. The counter always holds the current number of connected components.
| Method | Time | Online insert |
|---|---|---|
| BFS/DFS | O(V + E) | Must rerun |
| Union-Find | O(V + E·α(V)) | O(α(V)) per edge |
For incremental connectivity, union-find is strictly superior to graph traversal algorithms.
Treat each pixel as a node. Connect adjacent pixels whose intensity difference is below a threshold.
for each pixel p:
make_set(p)
for each adjacent pair (p, q):
if |intensity(p) - intensity(q)| < T:
union(p, q)
O(n log n) for n pixels (dominated by edge sorting)A graph-based segmentation algorithm using union-find at its core:
Union-find gives image segmentation algorithms their speed — merging millions of pixel regions in near-linear time.
An n × n grid of sites, each independently open with probability p. The system percolates if there is a connected path of open sites from top to bottom.
n² + 2 elements: one per site + virtual top + virtual bottomFind(top) == Find(bottom)Randomly open sites one at a time. Track when percolation first occurs. Over many trials, estimate the percolation threshold p*.
For a square lattice: p* ≈ 0.5927 (site percolation)
Each site opening requires up to 4 Union operations. Testing for percolation is a single Find comparison. Total cost per simulation: O(n² · α(n²)).
Percolation is a classic application from statistical physics. Union-find enables Monte Carlo estimation with millions of trials.
A disjoint set maintains equivalence classes under a relation that is:
x ~ x (Make-Set ensures every element is equivalent to itself)x ~ y ⇒ y ~ x (Union is symmetric)x ~ y, y ~ z ⇒ x ~ z (maintained through shared representatives)Extend union-find to handle function symbols: if a ~ b then f(a) ~ f(b). Used in SMT solvers for the theory of equality with uninterpreted functions.
In Hindley-Milner type inference, union-find merges type variables when constraints are discovered.
Merge registers that must hold the same value during register allocation.
Identify equivalent expressions to avoid redundant computation.
O(n) per op; the forest representation is the foundation for all efficient variantsO(log n) worst caseα(n) ≤ 4 for all practical n — effectively constantO(α(n)) for version access or undo| Source | Description |
|---|---|
| Tarjan (1975) | Original inverse Ackermann analysis |
| CLRS Ch. 21 | Data Structures for Disjoint Sets |
| Sedgewick & Wayne | Algorithms, 4th ed. — Union-Find case study |
| Fredman & Saks | The Ω(α(n)) lower bound (1989) |
| Conchon & Filliâtre | Persistent Union-Find (2007) |
| Felzenszwalb & Huttenlocher | Graph-Based Image Segmentation (2004) |