Given a connected, undirected graph G = (V, E), a spanning tree T = (V, E') is a subgraph that:
|V| - 1 edgesAn MST is a spanning tree whose total edge weight is minimised among all spanning trees of G.
| Property | Value |
|---|---|
| Edges | Exactly |V| - 1 |
| Cycles | None — adding any edge creates exactly one cycle |
| Cuts | Removing any edge disconnects into exactly two components |
| Uniqueness | Unique if all edge weights are distinct |
A graph can have exponentially many spanning trees. Cayley's formula: a complete graph Kn has n(n−2) spanning trees.
A cut partitions vertices into two non-empty sets (S, V \ S). A crossing edge has one endpoint in each set. A light edge is a crossing edge of minimum weight.
For any cut (S, V \ S), the minimum-weight crossing edge belongs to some MST.
The cut property is the foundation of every greedy MST algorithm — Kruskal's, Prim's, and Borůvka's all exploit it.
Assume the light edge e = (u, v) is not in MST T.
For any cycle C in graph G, the maximum-weight edge on C does not belong to any MST (assuming distinct weights).
Assume the heaviest edge e on cycle C is in MST T. Removing e splits T into two components. The cycle C must contain another edge e' crossing this cut. Since w(e') < w(e), replacing e with e' gives a lighter spanning tree — contradiction.
| Rule | Description |
|---|---|
| Blue rule (cut) | Colour the lightest crossing edge of any cut blue — it is in the MST |
| Red rule (cycle) | Colour the heaviest edge in any cycle red — it is not in the MST |
Applying blue and red rules in any order until all edges are coloured correctly identifies the MST. This is the basis of the generic MST algorithm.
All MST algorithms follow this pattern:
A = {} // MST edges
while A does not form a spanning tree:
find a cut (S, V\S) that respects A
add the light edge crossing (S, V\S) to A
return A
A cut respects a set A of edges if no edge in A crosses the cut.
At every step, the algorithm maintains the invariant that A is a subset of some MST. The cut property guarantees the light edge is safe to add.
Sort all edges; process in weight order — each edge defines a cut between components
Grow one tree; the cut is always (tree vertices, non-tree vertices)
Each component finds its lightest outgoing edge simultaneously
KRUSKAL(G):
sort edges by weight
T = {}
for each (u, v, w) in sorted edges:
if FIND(u) != FIND(v):
T = T ∪ {(u,v)}
UNION(u, v)
return T
| Operation | Cost |
|---|---|
| Sort edges | O(E log E) |
| Union-Find operations | O(E α(V)) ≈ O(E) |
| Total | O(E log E) = O(E log V) |
Kruskal's is ideal for sparse graphs where E is close to V. The sorting step dominates.
| Operation | Description |
|---|---|
MAKE-SET(x) | Create a singleton set containing x |
FIND(x) | Return the representative (root) of the set containing x |
UNION(x, y) | Merge the sets containing x and y |
FIND(x):
if x != parent[x]:
parent[x] = FIND(parent[x]) // path compression
return parent[x]
UNION(x, y):
rx, ry = FIND(x), FIND(y)
if rank[rx] < rank[ry]: swap(rx, ry)
parent[ry] = rx // union by rank
if rank[rx] == rank[ry]: rank[rx]++
With both optimisations, m operations on n elements cost O(m · α(n)), where α is the inverse Ackermann function — effectively O(1) per operation.
| Step | Edge | Wt | Action | Components |
|---|---|---|---|---|
| 1 | B-D | 1 | Add | {A} {B,D} {C} {E} |
| 2 | A-B | 2 | Add | {A,B,D} {C} {E} |
| 3 | B-C | 3 | Add | {A,B,C,D} {E} |
| 4 | A-C | 4 | Skip | — |
| 5 | C-E | 5 | Add | {A,B,C,D,E} |
MST edges: {B-D, A-B, B-C, C-E} — total weight = 11
4 edges for 5 vertices — exactly |V| - 1.
Prim's always grows a single tree. At each step the cut is (tree vertices, non-tree vertices) and the light edge is selected by the priority queue.
PRIM(G, s):
for each v in V:
key[v] = ∞, parent[v] = NIL
key[s] = 0
Q = min-priority-queue(V, key)
while Q is not empty:
u = EXTRACT-MIN(Q)
for each (u, v, w) in adj[u]:
if v ∈ Q and w < key[v]:
parent[v] = u
DECREASE-KEY(Q, v, w)
return {(v, parent[v]) : v ∈ V \ {s}}
| Priority Queue | EXTRACT-MIN | DECREASE-KEY | Total |
|---|---|---|---|
| Binary heap | O(log V) | O(log V) | O(E log V) |
| Fibonacci heap | O(log V) amort. | O(1) amort. | O(E + V log V) |
| Unsorted array | O(V) | O(1) | O(V²) |
Binary heap gives O(V log V) — excellent
Unsorted array gives O(V²) — matches binary heap and avoids overhead
Best asymptotic for dense graphs, but constant factors make it slower in practice for most inputs
Prim's with a binary heap has the same O(E log V) complexity as Kruskal's. Choose Prim's when the graph is given as an adjacency list and you want to avoid sorting all edges.
O(log V) phasesO(E)| Property | Benefit |
|---|---|
| Parallelisable | Each component acts independently per phase |
| No sorting | Unlike Kruskal's, edges need not be sorted |
| Merge-friendly | Natural fit for distributed and external-memory settings |
Borůvka's is the basis of the randomised O(E) expected-time algorithm by Karger-Klein-Tarjan (1995).
Invariant: at every step, the edge set A is a subset of some MST T*.
A = {} is trivially a subset of any MST.
Assume A ⊆ T*. We add a light edge e = (u, v) crossing a cut that respects A.
When |A| = |V| - 1, A spans all vertices and is a tree.
This single proof covers Kruskal's, Prim's, and Borůvka's — they differ only in how the cut is chosen.
| Condition | Unique? |
|---|---|
| All edge weights distinct | Yes — provably unique |
| Some equal weights, but no cycle has two edges of equal max weight | Yes |
| Equal weights with ambiguous cuts | Possibly multiple MSTs |
When edge weights come from real-world measurements (floating point), ties are essentially impossible — the MST is almost always unique.
Suppose two distinct MSTs T1 and T2 exist. Let e be the lightest edge in T1 that is not in T2.
A bottleneck spanning tree minimises the maximum edge weight across all spanning trees. The bottleneck value is the weight of the heaviest edge in the tree.
Every minimum spanning tree is a bottleneck spanning tree (but not necessarily vice versa).
The path between any two vertices in the MST minimises the maximum edge weight among all paths — useful in network routing where the bottleneck link matters.
Let TMST be an MST and TB be a bottleneck spanning tree. Suppose the bottleneck of TMST is edge e with weight w(e), and the bottleneck of TB is wB < w(e).
Given a graph G = (V, E) and a subset S ⊆ V of terminal vertices, find the minimum-weight tree connecting all terminals. Non-terminal vertices (Steiner points) may be included.
| Property | MST | Steiner Tree |
|---|---|---|
| Must span | All vertices in V | Only terminals in S |
| May include | All vertices | Any subset of V \ S |
| Complexity | Polynomial | NP-hard |
| Approximation | Exact | 2-approx via MST |
The Steiner tree found is at most twice the weight of the optimal.
Steiner trees arise in VLSI chip design, network cable routing, and phylogenetic tree reconstruction.
| Graph type | Edge count | Example |
|---|---|---|
| Sparse | E = O(V) | Road networks, planar graphs |
| Moderate | E = O(V log V) | Social network subgraphs |
| Dense | E = O(V²) | Complete or near-complete graphs |
| Scenario | Best algorithm | Complexity |
|---|---|---|
| Sparse (E ~ V) | Kruskal's + Union-Find | O(V log V) |
| Dense (E ~ V²) | Prim's + unsorted array | O(V²) |
| Very large, parallel | Borůvka's | O(E log V) |
| Expected linear | KKT randomised | O(E) expected |
Chazelle's algorithm achieves O(E · α(E, V)) — nearly linear
Karger-Klein-Tarjan achieves O(E) expected time
Does a deterministic O(E) MST algorithm exist? This remains one of the most tantalising open questions in combinatorial optimisation.
Each node in a network knows only its own edges and their weights. Nodes communicate by passing messages along edges. Goal: compute the MST cooperatively.
A distributed version of Borůvka's algorithm:
| Metric | Cost |
|---|---|
| Message complexity | O(E + V log V) |
| Time complexity | O(V log V) |
| Phases | O(log V) |
Fragments have levels. Two fragments at the same level merge and increase level by 1. A lower-level fragment is absorbed into a higher-level one without level increase. This prevents O(V) phases.
GHS is a foundational algorithm in distributed computing — it demonstrates that global structure (MST) can be computed with only local information and message passing.
Connect cities with minimum total cable length. The MST gives the cheapest connected network. Variants include degree-constrained MST and capacitated MST.
Road networks, pipeline routing, power grid design — MST provides a baseline minimum-cost connected layout before adding redundancy.
MST is used as a subroutine in approximation algorithms for harder problems:
| Problem | MST role |
|---|---|
| Metric TSP | MST gives 2-approx; Christofides uses MST + matching for 1.5-approx |
| Steiner tree | MST of terminal metric closure gives 2-approx |
| k-connectivity | Augment MST with additional edges |
MST algorithms have been critical infrastructure in network engineering since the 1950s — the algorithms predate the Internet itself.
This is equivalent to single-linkage hierarchical clustering and produces a dendrogram naturally from the MST.
Felzenszwalb & Huttenlocher (2004):
| Domain | Application |
|---|---|
| Bioinformatics | Phylogenetic tree estimation from genetic distances |
| Robotics | Coverage path planning over sensor networks |
| Finance | MST of asset correlation matrix reveals market structure |
| Cartography | Simplifying geographic networks for visualisation |
MST-based clustering is non-parametric — no need to specify the number of clusters in advance. The dendrogram allows cutting at any level. It naturally detects elongated, non-convex clusters that k-means misses.
| Algorithm | Time | Space | Notes |
|---|---|---|---|
| Kruskal's | O(E log E) | O(E + V) | Sort edges; Union-Find. Best for sparse graphs |
| Prim's (binary heap) | O(E log V) | O(V) | Grow single tree. Best for adjacency lists |
| Prim's (Fibonacci) | O(E + V log V) | O(V) | Best asymptotic for dense graphs |
| Prim's (array) | O(V²) | O(V) | Simplest for dense adjacency matrices |
| Borůvka's | O(E log V) | O(E + V) | Parallelisable; no sorting needed |
| Chazelle | O(E · α(E,V)) | O(E + V) | Nearly linear; complex implementation |
| KKT randomised | O(E) expected | O(E + V) | Optimal expected; verification subroutine |
The cut property is the single principle behind all greedy MST algorithms
Kruskal's and Prim's are the practical workhorses — choose based on graph density
MST is a subroutine in approximation algorithms for TSP, Steiner tree, and clustering
Recommended: CLRS ch. 21 & 23 · Kleinberg & Tardos ch. 4 · Tarjan, Data Structures and Network Algorithms
Papers: Karger-Klein-Tarjan (1995) · GHS (1983) · Felzenszwalb & Huttenlocher (2004)