COMPUTER SCIENCE FUNDAMENTALS SERIES

Minimum Spanning
Trees

Kruskal's · Prim's · Borůvka's · Cut property ·
Union-Find · Network design
Mid-level software engineer track  ·  20 slides
02

Spanning Trees — Definition & Properties

What is a spanning tree?

Given a connected, undirected graph G = (V, E), a spanning tree T = (V, E') is a subgraph that:

  • Contains all vertices in V
  • Is connected
  • Is acyclic (a tree)
  • Has exactly |V| - 1 edges

Minimum spanning tree

An MST is a spanning tree whose total edge weight is minimised among all spanning trees of G.

PropertyValue
EdgesExactly |V| - 1
CyclesNone — adding any edge creates exactly one cycle
CutsRemoving any edge disconnects into exactly two components
UniquenessUnique if all edge weights are distinct

A graph can have exponentially many spanning trees. Cayley's formula: a complete graph Kn has n(n−2) spanning trees.

03

Cut Property

Definition

A cut partitions vertices into two non-empty sets (S, V \ S). A crossing edge has one endpoint in each set. A light edge is a crossing edge of minimum weight.

Cut property theorem

For any cut (S, V \ S), the minimum-weight crossing edge belongs to some MST.

The cut property is the foundation of every greedy MST algorithm — Kruskal's, Prim's, and Borůvka's all exploit it.

Proof sketch

Assume the light edge e = (u, v) is not in MST T.

  • Adding e to T creates a cycle
  • This cycle must contain another crossing edge e'
  • Since w(e) ≤ w(e'), swapping e' for e yields T' with w(T') ≤ w(T)
  • Therefore T' is also an MST containing e
S V \ S u v e (light) e' (heavier)
04

Cycle Property

Definition

For any cycle C in graph G, the maximum-weight edge on C does not belong to any MST (assuming distinct weights).

Proof sketch

Assume the heaviest edge e on cycle C is in MST T. Removing e splits T into two components. The cycle C must contain another edge e' crossing this cut. Since w(e') < w(e), replacing e with e' gives a lighter spanning tree — contradiction.

Red rule / Blue rule

RuleDescription
Blue rule (cut)Colour the lightest crossing edge of any cut blue — it is in the MST
Red rule (cycle)Colour the heaviest edge in any cycle red — it is not in the MST

Applying blue and red rules in any order until all edges are coloured correctly identifies the MST. This is the basis of the generic MST algorithm.

05

Generic MST Algorithm

The abstract framework

All MST algorithms follow this pattern:

A = {}                              // MST edges
while A does not form a spanning tree:
    find a cut (S, V\S) that respects A
    add the light edge crossing (S, V\S) to A
return A

A cut respects a set A of edges if no edge in A crosses the cut.

Key insight

At every step, the algorithm maintains the invariant that A is a subset of some MST. The cut property guarantees the light edge is safe to add.

Three instantiations

Kruskal's

Sort all edges; process in weight order — each edge defines a cut between components

Prim's

Grow one tree; the cut is always (tree vertices, non-tree vertices)

Borůvka's

Each component finds its lightest outgoing edge simultaneously

06

Kruskal's Algorithm

Algorithm

  1. Sort all edges by weight
  2. For each edge (u, v) in sorted order: if u and v are in different components, add edge to MST; otherwise skip
  3. Stop when |V| - 1 edges are added
KRUSKAL(G):
    sort edges by weight
    T = {}
    for each (u, v, w) in sorted edges:
        if FIND(u) != FIND(v):
            T = T ∪ {(u,v)}
            UNION(u, v)
    return T

Complexity

OperationCost
Sort edgesO(E log E)
Union-Find operationsO(E α(V)) ≈ O(E)
TotalO(E log E) = O(E log V)

Kruskal's is ideal for sparse graphs where E is close to V. The sorting step dominates.

07

Union-Find Data Structure

Operations

OperationDescription
MAKE-SET(x)Create a singleton set containing x
FIND(x)Return the representative (root) of the set containing x
UNION(x, y)Merge the sets containing x and y

Optimisations

  • Union by rank — attach the shorter tree under the taller tree's root
  • Path compression — during FIND, make every visited node point directly to the root
FIND(x):
    if x != parent[x]:
        parent[x] = FIND(parent[x])  // path compression
    return parent[x]

UNION(x, y):
    rx, ry = FIND(x), FIND(y)
    if rank[rx] < rank[ry]: swap(rx, ry)
    parent[ry] = rx               // union by rank
    if rank[rx] == rank[ry]: rank[rx]++

Amortised complexity

With both optimisations, m operations on n elements cost O(m · α(n)), where α is the inverse Ackermann function — effectively O(1) per operation.

08

Kruskal's — Worked Example

Graph

A B C D E 2 4 1 3 5 7 6

Execution trace

StepEdgeWtActionComponents
1B-D1Add{A} {B,D} {C} {E}
2A-B2Add{A,B,D} {C} {E}
3B-C3Add{A,B,C,D} {E}
4A-C4Skip
5C-E5Add{A,B,C,D,E}

MST edges: {B-D, A-B, B-C, C-E} — total weight = 11

4 edges for 5 vertices — exactly |V| - 1.

09

Prim's Algorithm

Algorithm

  1. Start from any vertex s; initialise priority queue with all vertices (key = ∞)
  2. Set key[s] = 0
  3. While the priority queue is not empty: extract vertex u with minimum key; for each neighbour v of u, if v is in the queue and w(u,v) < key[v], update key[v] and parent[v]

Prim's always grows a single tree. At each step the cut is (tree vertices, non-tree vertices) and the light edge is selected by the priority queue.

PRIM(G, s):
    for each v in V:
        key[v] = ∞, parent[v] = NIL
    key[s] = 0
    Q = min-priority-queue(V, key)
    while Q is not empty:
        u = EXTRACT-MIN(Q)
        for each (u, v, w) in adj[u]:
            if v ∈ Q and w < key[v]:
                parent[v] = u
                DECREASE-KEY(Q, v, w)
    return {(v, parent[v]) : v ∈ V \ {s}}
10

Prim's — Priority Queue Implementation

Complexity depends on the priority queue

Priority QueueEXTRACT-MINDECREASE-KEYTotal
Binary heapO(log V)O(log V)O(E log V)
Fibonacci heapO(log V) amort.O(1) amort.O(E + V log V)
Unsorted arrayO(V)O(1)O(V²)

Sparse graphs (E ~ V)

Binary heap gives O(V log V) — excellent

Dense graphs (E ~ V²)

Unsorted array gives O(V²) — matches binary heap and avoids overhead

Fibonacci heap

Best asymptotic for dense graphs, but constant factors make it slower in practice for most inputs

Prim's with a binary heap has the same O(E log V) complexity as Kruskal's. Choose Prim's when the graph is given as an adjacency list and you want to avoid sorting all edges.

11

Borůvka's Algorithm

Algorithm (1926 — oldest MST algorithm)

  1. Start with each vertex as its own component
  2. In each phase, for every component: find the lightest edge leaving the component
  3. Add all such edges (handle ties consistently to avoid cycles)
  4. Repeat until only one component remains

Complexity analysis

  • Each phase at least halves the number of components
  • Therefore at most O(log V) phases
  • Each phase scans all edges: O(E)
  • Total: O(E log V)

Advantages

PropertyBenefit
ParallelisableEach component acts independently per phase
No sortingUnlike Kruskal's, edges need not be sorted
Merge-friendlyNatural fit for distributed and external-memory settings

Borůvka's is the basis of the randomised O(E) expected-time algorithm by Karger-Klein-Tarjan (1995).

12

Proof of Correctness

Theorem: the generic MST algorithm produces a minimum spanning tree

Invariant: at every step, the edge set A is a subset of some MST T*.

Base case

A = {} is trivially a subset of any MST.

Inductive step

Assume A ⊆ T*. We add a light edge e = (u, v) crossing a cut that respects A.

  • Case 1: e ∈ T*. Then A ∪ {e} ⊆ T*.
  • Case 2: e ∉ T*. Adding e to T* creates a cycle C containing another crossing edge e'. Since w(e) ≤ w(e'), swapping gives T** with w(T**) ≤ w(T*). So T** is an MST and A ∪ {e} ⊆ T**.

Termination

When |A| = |V| - 1, A spans all vertices and is a tree.

This single proof covers Kruskal's, Prim's, and Borůvka's — they differ only in how the cut is chosen.

13

MST Uniqueness Conditions

When is the MST unique?

ConditionUnique?
All edge weights distinctYes — provably unique
Some equal weights, but no cycle has two edges of equal max weightYes
Equal weights with ambiguous cutsPossibly multiple MSTs

Practical implication

When edge weights come from real-world measurements (floating point), ties are essentially impossible — the MST is almost always unique.

Proof of uniqueness with distinct weights

Suppose two distinct MSTs T1 and T2 exist. Let e be the lightest edge in T1 that is not in T2.

  • Adding e to T2 creates a cycle
  • Every other edge in this cycle is in T2 and has a different weight from e
  • At least one edge e' in this cycle is not in T1
  • Since all weights are distinct and e is the lightest differing edge, w(e) < w(e')
  • Replacing e' with e in T2 gives a lighter tree — contradicting T2 being an MST
14

Bottleneck Spanning Trees

Definition

A bottleneck spanning tree minimises the maximum edge weight across all spanning trees. The bottleneck value is the weight of the heaviest edge in the tree.

Key theorem

Every minimum spanning tree is a bottleneck spanning tree (but not necessarily vice versa).

Application

Minimax path problem

The path between any two vertices in the MST minimises the maximum edge weight among all paths — useful in network routing where the bottleneck link matters.

Proof

Let TMST be an MST and TB be a bottleneck spanning tree. Suppose the bottleneck of TMST is edge e with weight w(e), and the bottleneck of TB is wB < w(e).

  • Removing e from TMST splits V into (S, V\S)
  • Any spanning tree must cross this cut, so TB has a crossing edge e' with w(e') ≤ wB < w(e)
  • Replacing e with e' in TMST yields a lighter spanning tree
  • This contradicts TMST being an MST
15

Steiner Trees

The problem

Given a graph G = (V, E) and a subset S ⊆ V of terminal vertices, find the minimum-weight tree connecting all terminals. Non-terminal vertices (Steiner points) may be included.

MST vs Steiner tree

PropertyMSTSteiner Tree
Must spanAll vertices in VOnly terminals in S
May includeAll verticesAny subset of V \ S
ComplexityPolynomialNP-hard
ApproximationExact2-approx via MST

The MST heuristic (2-approximation)

  1. Compute shortest paths between all pairs of terminals
  2. Build a complete graph on S with shortest-path distances as weights
  3. Find the MST of this complete graph
  4. Map edges back to paths in G; remove redundant edges

The Steiner tree found is at most twice the weight of the optimal.

Steiner trees arise in VLSI chip design, network cable routing, and phylogenetic tree reconstruction.

16

MST in Dense vs Sparse Graphs

Definitions

Graph typeEdge countExample
SparseE = O(V)Road networks, planar graphs
ModerateE = O(V log V)Social network subgraphs
DenseE = O(V²)Complete or near-complete graphs

Algorithm selection

ScenarioBest algorithmComplexity
Sparse (E ~ V)Kruskal's + Union-FindO(V log V)
Dense (E ~ V²)Prim's + unsorted arrayO(V²)
Very large, parallelBorůvka'sO(E log V)
Expected linearKKT randomisedO(E) expected

Theoretical frontier

Deterministic

Chazelle's algorithm achieves O(E · α(E, V)) — nearly linear

Randomised

Karger-Klein-Tarjan achieves O(E) expected time

Open problem

Does a deterministic O(E) MST algorithm exist? This remains one of the most tantalising open questions in combinatorial optimisation.

17

Distributed MST — GHS Algorithm

The setting

Each node in a network knows only its own edges and their weights. Nodes communicate by passing messages along edges. Goal: compute the MST cooperatively.

Gallager-Humblet-Spira (1983)

A distributed version of Borůvka's algorithm:

  1. Each node starts as its own fragment (component)
  2. Each fragment finds its minimum-weight outgoing edge (MWOE)
  3. Fragments merge along their MWOEs
  4. Repeat until one fragment spans the network

Complexity

MetricCost
Message complexityO(E + V log V)
Time complexityO(V log V)
PhasesO(log V)

Level-based merging

Fragments have levels. Two fragments at the same level merge and increase level by 1. A lower-level fragment is absorbed into a higher-level one without level increase. This prevents O(V) phases.

GHS is a foundational algorithm in distributed computing — it demonstrates that global structure (MST) can be computed with only local information and message passing.

18

Applications — Network Design

Telecommunications

Connect cities with minimum total cable length. The MST gives the cheapest connected network. Variants include degree-constrained MST and capacitated MST.

Computer networks

  • Spanning Tree Protocol (STP) — Ethernet switches compute a spanning tree to prevent broadcast loops
  • Overlay networks — multicast trees for content distribution built as MSTs over latency graphs

Transportation

Road networks, pipeline routing, power grid design — MST provides a baseline minimum-cost connected layout before adding redundancy.

Approximation backbone

MST is used as a subroutine in approximation algorithms for harder problems:

ProblemMST role
Metric TSPMST gives 2-approx; Christofides uses MST + matching for 1.5-approx
Steiner treeMST of terminal metric closure gives 2-approx
k-connectivityAugment MST with additional edges

MST algorithms have been critical infrastructure in network engineering since the 1950s — the algorithms predate the Internet itself.

19

Applications — Clustering & Segmentation

Single-linkage clustering

  1. Build the MST of the data points (using pairwise distances as edge weights)
  2. Remove the k - 1 heaviest edges
  3. The resulting k connected components are the clusters

This is equivalent to single-linkage hierarchical clustering and produces a dendrogram naturally from the MST.

Image segmentation

Felzenszwalb & Huttenlocher (2004):

  1. Build a graph where pixels are vertices and edges connect adjacent pixels with weight = colour difference
  2. Run a modified Kruskal's: merge components if the edge weight is below an adaptive threshold
  3. Result: over-segmented regions that respect strong boundaries

Other applications

DomainApplication
BioinformaticsPhylogenetic tree estimation from genetic distances
RoboticsCoverage path planning over sensor networks
FinanceMST of asset correlation matrix reveals market structure
CartographySimplifying geographic networks for visualisation

Why MST for clustering?

MST-based clustering is non-parametric — no need to specify the number of clusters in advance. The dendrogram allows cutting at any level. It naturally detects elongated, non-convex clusters that k-means misses.

20

Complexity Comparison & Summary

Algorithm comparison

AlgorithmTimeSpaceNotes
Kruskal'sO(E log E)O(E + V)Sort edges; Union-Find. Best for sparse graphs
Prim's (binary heap)O(E log V)O(V)Grow single tree. Best for adjacency lists
Prim's (Fibonacci)O(E + V log V)O(V)Best asymptotic for dense graphs
Prim's (array)O(V²)O(V)Simplest for dense adjacency matrices
Borůvka'sO(E log V)O(E + V)Parallelisable; no sorting needed
ChazelleO(E · α(E,V))O(E + V)Nearly linear; complex implementation
KKT randomisedO(E) expectedO(E + V)Optimal expected; verification subroutine

Key takeaways

The cut property is the single principle behind all greedy MST algorithms

Kruskal's and Prim's are the practical workhorses — choose based on graph density

MST is a subroutine in approximation algorithms for TSP, Steiner tree, and clustering

Recommended: CLRS ch. 21 & 23 · Kleinberg & Tardos ch. 4 · Tarjan, Data Structures and Network Algorithms

Papers: Karger-Klein-Tarjan (1995) · GHS (1983) · Felzenszwalb & Huttenlocher (2004)