COMPUTER SCIENCE FUNDAMENTALS SERIES

Minimum Spanning
Trees

Kruskal's · Prim's · Borůvka's · Cut property ·
Union-Find · Network design

Mid-level software engineer track · 20 slides

Spanning Trees — Definition & Properties

What is a spanning tree?

Given a connected, undirected graph G = (V, E), a spanning tree T = (V, E') is a subgraph that:

Contains all vertices in V
Is connected
Is acyclic (a tree)
Has exactly |V| - 1 edges

Minimum spanning tree

An MST is a spanning tree whose total edge weight is minimised among all spanning trees of G.

Property	Value
Edges	Exactly \|V\| - 1
Cycles	None — adding any edge creates exactly one cycle
Cuts	Removing any edge disconnects into exactly two components
Uniqueness	Unique if all edge weights are distinct

A graph can have exponentially many spanning trees. Cayley's formula: a complete graph K_n has n⁽ⁿ⁻²⁾ spanning trees.

Cut Property

Definition

A cut partitions vertices into two non-empty sets (S, V \ S). A crossing edge has one endpoint in each set. A light edge is a crossing edge of minimum weight.

Cut property theorem

For any cut (S, V \ S), the minimum-weight crossing edge belongs to some MST.

The cut property is the foundation of every greedy MST algorithm — Kruskal's, Prim's, and Borůvka's all exploit it.

Proof sketch

Assume the light edge e = (u, v) is not in MST T.

Adding e to T creates a cycle
This cycle must contain another crossing edge e'
Since w(e) ≤ w(e'), swapping e' for e yields T' with w(T') ≤ w(T)
Therefore T' is also an MST containing e

Cycle Property

Definition

For any cycle C in graph G, the maximum-weight edge on C does not belong to any MST (assuming distinct weights).

Proof sketch

Assume the heaviest edge e on cycle C is in MST T. Removing e splits T into two components. The cycle C must contain another edge e' crossing this cut. Since w(e') < w(e), replacing e with e' gives a lighter spanning tree — contradiction.

Red rule / Blue rule

Rule	Description
Blue rule (cut)	Colour the lightest crossing edge of any cut blue — it is in the MST
Red rule (cycle)	Colour the heaviest edge in any cycle red — it is not in the MST

Applying blue and red rules in any order until all edges are coloured correctly identifies the MST. This is the basis of the generic MST algorithm.

Generic MST Algorithm

The abstract framework

All MST algorithms follow this pattern:

A = {}                              // MST edges
while A does not form a spanning tree:
    find a cut (S, V\S) that respects A
    add the light edge crossing (S, V\S) to A
return A

A cut respects a set A of edges if no edge in A crosses the cut.

Key insight

At every step, the algorithm maintains the invariant that A is a subset of some MST. The cut property guarantees the light edge is safe to add.

Three instantiations

Kruskal's

Sort all edges; process in weight order — each edge defines a cut between components

Prim's

Grow one tree; the cut is always (tree vertices, non-tree vertices)

Borůvka's

Each component finds its lightest outgoing edge simultaneously

Kruskal's Algorithm

Algorithm

Sort all edges by weight
For each edge (u, v) in sorted order: if u and v are in different components, add edge to MST; otherwise skip
Stop when |V| - 1 edges are added

KRUSKAL(G):
    sort edges by weight
    T = {}
    for each (u, v, w) in sorted edges:
        if FIND(u) != FIND(v):
            T = T ∪ {(u,v)}
            UNION(u, v)
    return T

Complexity

Operation	Cost
Sort edges	O(E log E)
Union-Find operations	O(E α(V)) ≈ O(E)
Total	O(E log E) = O(E log V)

Kruskal's is ideal for sparse graphs where E is close to V. The sorting step dominates.

Union-Find Data Structure

Operations

Operation	Description
`MAKE-SET(x)`	Create a singleton set containing x
`FIND(x)`	Return the representative (root) of the set containing x
`UNION(x, y)`	Merge the sets containing x and y

Optimisations

Union by rank — attach the shorter tree under the taller tree's root
Path compression — during FIND, make every visited node point directly to the root

FIND(x):
    if x != parent[x]:
        parent[x] = FIND(parent[x])  // path compression
    return parent[x]

UNION(x, y):
    rx, ry = FIND(x), FIND(y)
    if rank[rx] < rank[ry]: swap(rx, ry)
    parent[ry] = rx               // union by rank
    if rank[rx] == rank[ry]: rank[rx]++

Amortised complexity

With both optimisations, m operations on n elements cost O(m · α(n)), where α is the inverse Ackermann function — effectively O(1) per operation.

Kruskal's — Worked Example

Graph

Execution trace

Step	Edge	Wt	Action	Components
1	B-D	1	Add	{A} {B,D} {C} {E}
2	A-B	2	Add	{A,B,D} {C} {E}
3	B-C	3	Add	{A,B,C,D} {E}
4	A-C	4	Skip	—
5	C-E	5	Add	{A,B,C,D,E}

MST edges: {B-D, A-B, B-C, C-E} — total weight = 11

4 edges for 5 vertices — exactly |V| - 1.

Prim's Algorithm

Algorithm

Start from any vertex s; initialise priority queue with all vertices (key = ∞)
Set key[s] = 0
While the priority queue is not empty: extract vertex u with minimum key; for each neighbour v of u, if v is in the queue and w(u,v) < key[v], update key[v] and parent[v]

Prim's always grows a single tree. At each step the cut is (tree vertices, non-tree vertices) and the light edge is selected by the priority queue.

PRIM(G, s):
    for each v in V:
        key[v] = ∞, parent[v] = NIL
    key[s] = 0
    Q = min-priority-queue(V, key)
    while Q is not empty:
        u = EXTRACT-MIN(Q)
        for each (u, v, w) in adj[u]:
            if v ∈ Q and w < key[v]:
                parent[v] = u
                DECREASE-KEY(Q, v, w)
    return {(v, parent[v]) : v ∈ V \ {s}}

Prim's — Priority Queue Implementation

Complexity depends on the priority queue

Priority Queue	EXTRACT-MIN	DECREASE-KEY	Total
Binary heap	O(log V)	O(log V)	O(E log V)
Fibonacci heap	O(log V) amort.	O(1) amort.	O(E + V log V)
Unsorted array	O(V)	O(1)	O(V²)

Sparse graphs (E ~ V)

Binary heap gives O(V log V) — excellent

Dense graphs (E ~ V²)

Unsorted array gives O(V²) — matches binary heap and avoids overhead

Fibonacci heap

Best asymptotic for dense graphs, but constant factors make it slower in practice for most inputs

Prim's with a binary heap has the same O(E log V) complexity as Kruskal's. Choose Prim's when the graph is given as an adjacency list and you want to avoid sorting all edges.

Borůvka's Algorithm

Algorithm (1926 — oldest MST algorithm)

Start with each vertex as its own component
In each phase, for every component: find the lightest edge leaving the component
Add all such edges (handle ties consistently to avoid cycles)
Repeat until only one component remains

Complexity analysis

Each phase at least halves the number of components
Therefore at most O(log V) phases
Each phase scans all edges: O(E)
Total: O(E log V)

Advantages

Property	Benefit
Parallelisable	Each component acts independently per phase
No sorting	Unlike Kruskal's, edges need not be sorted
Merge-friendly	Natural fit for distributed and external-memory settings

Borůvka's is the basis of the randomised O(E) expected-time algorithm by Karger-Klein-Tarjan (1995).

Proof of Correctness

Theorem: the generic MST algorithm produces a minimum spanning tree

Invariant: at every step, the edge set A is a subset of some MST T*.

Base case

A = {} is trivially a subset of any MST.

Inductive step

Assume A ⊆ T*. We add a light edge e = (u, v) crossing a cut that respects A.

Case 1: e ∈ T*. Then A ∪ {e} ⊆ T*.
Case 2: e ∉ T*. Adding e to T* creates a cycle C containing another crossing edge e'. Since w(e) ≤ w(e'), swapping gives T** with w(T**) ≤ w(T*). So T** is an MST and A ∪ {e} ⊆ T**.

Termination

When |A| = |V| - 1, A spans all vertices and is a tree.

This single proof covers Kruskal's, Prim's, and Borůvka's — they differ only in how the cut is chosen.

MST Uniqueness Conditions

When is the MST unique?

Condition	Unique?
All edge weights distinct	Yes — provably unique
Some equal weights, but no cycle has two edges of equal max weight	Yes
Equal weights with ambiguous cuts	Possibly multiple MSTs

Practical implication

When edge weights come from real-world measurements (floating point), ties are essentially impossible — the MST is almost always unique.

Proof of uniqueness with distinct weights

Suppose two distinct MSTs T₁ and T₂ exist. Let e be the lightest edge in T₁ that is not in T₂.

Adding e to T₂ creates a cycle
Every other edge in this cycle is in T₂ and has a different weight from e
At least one edge e' in this cycle is not in T₁
Since all weights are distinct and e is the lightest differing edge, w(e) < w(e')
Replacing e' with e in T₂ gives a lighter tree — contradicting T₂ being an MST

Bottleneck Spanning Trees

Definition

A bottleneck spanning tree minimises the maximum edge weight across all spanning trees. The bottleneck value is the weight of the heaviest edge in the tree.

Key theorem

Every minimum spanning tree is a bottleneck spanning tree (but not necessarily vice versa).

Application

Minimax path problem

The path between any two vertices in the MST minimises the maximum edge weight among all paths — useful in network routing where the bottleneck link matters.

Proof

Let T_MST be an MST and T_B be a bottleneck spanning tree. Suppose the bottleneck of T_MST is edge e with weight w(e), and the bottleneck of T_B is w_B < w(e).

Removing e from T_MST splits V into (S, V\S)
Any spanning tree must cross this cut, so T_B has a crossing edge e' with w(e') ≤ w_B < w(e)
Replacing e with e' in T_MST yields a lighter spanning tree
This contradicts T_MST being an MST

Steiner Trees

The problem

Given a graph G = (V, E) and a subset S ⊆ V of terminal vertices, find the minimum-weight tree connecting all terminals. Non-terminal vertices (Steiner points) may be included.

MST vs Steiner tree

Property	MST	Steiner Tree
Must span	All vertices in V	Only terminals in S
May include	All vertices	Any subset of V \ S
Complexity	Polynomial	NP-hard
Approximation	Exact	2-approx via MST

The MST heuristic (2-approximation)

Compute shortest paths between all pairs of terminals
Build a complete graph on S with shortest-path distances as weights
Find the MST of this complete graph
Map edges back to paths in G; remove redundant edges

The Steiner tree found is at most twice the weight of the optimal.

Steiner trees arise in VLSI chip design, network cable routing, and phylogenetic tree reconstruction.

MST in Dense vs Sparse Graphs

Definitions

Graph type	Edge count	Example
Sparse	E = O(V)	Road networks, planar graphs
Moderate	E = O(V log V)	Social network subgraphs
Dense	E = O(V²)	Complete or near-complete graphs

Algorithm selection

Scenario	Best algorithm	Complexity
Sparse (E ~ V)	Kruskal's + Union-Find	O(V log V)
Dense (E ~ V²)	Prim's + unsorted array	O(V²)
Very large, parallel	Borůvka's	O(E log V)
Expected linear	KKT randomised	O(E) expected

Theoretical frontier

Deterministic

Chazelle's algorithm achieves O(E · α(E, V)) — nearly linear

Randomised

Karger-Klein-Tarjan achieves O(E) expected time

Open problem

Does a deterministic O(E) MST algorithm exist? This remains one of the most tantalising open questions in combinatorial optimisation.

Distributed MST — GHS Algorithm

The setting

Each node in a network knows only its own edges and their weights. Nodes communicate by passing messages along edges. Goal: compute the MST cooperatively.

Gallager-Humblet-Spira (1983)

A distributed version of Borůvka's algorithm:

Each node starts as its own fragment (component)
Each fragment finds its minimum-weight outgoing edge (MWOE)
Fragments merge along their MWOEs
Repeat until one fragment spans the network

Complexity

Metric	Cost
Message complexity	O(E + V log V)
Time complexity	O(V log V)
Phases	O(log V)

Level-based merging

Fragments have levels. Two fragments at the same level merge and increase level by 1. A lower-level fragment is absorbed into a higher-level one without level increase. This prevents O(V) phases.

GHS is a foundational algorithm in distributed computing — it demonstrates that global structure (MST) can be computed with only local information and message passing.

Applications — Network Design

Telecommunications

Connect cities with minimum total cable length. The MST gives the cheapest connected network. Variants include degree-constrained MST and capacitated MST.

Computer networks

Spanning Tree Protocol (STP) — Ethernet switches compute a spanning tree to prevent broadcast loops
Overlay networks — multicast trees for content distribution built as MSTs over latency graphs

Transportation

Road networks, pipeline routing, power grid design — MST provides a baseline minimum-cost connected layout before adding redundancy.

Approximation backbone

MST is used as a subroutine in approximation algorithms for harder problems:

Problem	MST role
Metric TSP	MST gives 2-approx; Christofides uses MST + matching for 1.5-approx
Steiner tree	MST of terminal metric closure gives 2-approx
k-connectivity	Augment MST with additional edges

MST algorithms have been critical infrastructure in network engineering since the 1950s — the algorithms predate the Internet itself.

Applications — Clustering & Segmentation

Single-linkage clustering

Build the MST of the data points (using pairwise distances as edge weights)
Remove the k - 1 heaviest edges
The resulting k connected components are the clusters

This is equivalent to single-linkage hierarchical clustering and produces a dendrogram naturally from the MST.

Image segmentation

Felzenszwalb & Huttenlocher (2004):

Build a graph where pixels are vertices and edges connect adjacent pixels with weight = colour difference
Run a modified Kruskal's: merge components if the edge weight is below an adaptive threshold
Result: over-segmented regions that respect strong boundaries

Other applications

Domain	Application
Bioinformatics	Phylogenetic tree estimation from genetic distances
Robotics	Coverage path planning over sensor networks
Finance	MST of asset correlation matrix reveals market structure
Cartography	Simplifying geographic networks for visualisation

Why MST for clustering?

MST-based clustering is non-parametric — no need to specify the number of clusters in advance. The dendrogram allows cutting at any level. It naturally detects elongated, non-convex clusters that k-means misses.

Complexity Comparison & Summary

Algorithm comparison

Algorithm	Time	Space	Notes
Kruskal's	O(E log E)	O(E + V)	Sort edges; Union-Find. Best for sparse graphs
Prim's (binary heap)	O(E log V)	O(V)	Grow single tree. Best for adjacency lists
Prim's (Fibonacci)	O(E + V log V)	O(V)	Best asymptotic for dense graphs
Prim's (array)	O(V²)	O(V)	Simplest for dense adjacency matrices
Borůvka's	O(E log V)	O(E + V)	Parallelisable; no sorting needed
Chazelle	O(E · α(E,V))	O(E + V)	Nearly linear; complex implementation
KKT randomised	O(E) expected	O(E + V)	Optimal expected; verification subroutine

Key takeaways

The cut property is the single principle behind all greedy MST algorithms

Kruskal's and Prim's are the practical workhorses — choose based on graph density

MST is a subroutine in approximation algorithms for TSP, Steiner tree, and clustering

Recommended: CLRS ch. 21 & 23 · Kleinberg & Tardos ch. 4 · Tarjan, Data Structures and Network Algorithms

Papers: Karger-Klein-Tarjan (1995) · GHS (1983) · Felzenszwalb & Huttenlocher (2004)

Minimum SpanningTrees

Spanning Trees — Definition & Properties

What is a spanning tree?

Minimum spanning tree

Cut Property

Definition

Cut property theorem

Proof sketch

Cycle Property

Definition

Proof sketch

Red rule / Blue rule

Generic MST Algorithm

The abstract framework

Key insight

Three instantiations

Kruskal's

Prim's

Borůvka's

Kruskal's Algorithm

Algorithm

Complexity

Union-Find Data Structure

Operations

Optimisations

Amortised complexity

Kruskal's — Worked Example

Graph

Execution trace

Prim's Algorithm

Algorithm

Prim's — Priority Queue Implementation

Complexity depends on the priority queue

Sparse graphs (E ~ V)

Dense graphs (E ~ V²)

Fibonacci heap

Borůvka's Algorithm

Algorithm (1926 — oldest MST algorithm)

Complexity analysis

Advantages

Proof of Correctness

Theorem: the generic MST algorithm produces a minimum spanning tree

Base case

Inductive step

Termination

MST Uniqueness Conditions

When is the MST unique?

Practical implication

Proof of uniqueness with distinct weights

Bottleneck Spanning Trees

Definition

Key theorem

Application

Minimax path problem

Proof

Steiner Trees

The problem

MST vs Steiner tree

The MST heuristic (2-approximation)

MST in Dense vs Sparse Graphs

Definitions

Algorithm selection

Theoretical frontier

Deterministic

Randomised

Open problem

Distributed MST — GHS Algorithm

The setting

Gallager-Humblet-Spira (1983)

Complexity

Level-based merging

Applications — Network Design

Telecommunications

Computer networks

Transportation

Approximation backbone

Applications — Clustering & Segmentation

Single-linkage clustering

Image segmentation

Other applications

Minimum Spanning
Trees