Low-Density Parity-Check codes — from Gallager's forgotten thesis to the backbone of modern wireless, storage, and broadcast systems.
Gallager (1962), forgotten for 30 years, rediscovered by MacKay & Neal (1996).
Sparse parity-check matrices, Tanner graphs, regular vs irregular codes.
The encoding challenge and practical solutions.
Belief propagation, sum-product, min-sum algorithms.
Density evolution, thresholds, capacity-approaching designs.
Wi-Fi, 5G, DVB-S2, Ethernet — LDPC everywhere.
Robert Gallager's 1960 MIT PhD thesis, published as a monograph in 1963, introduced LDPC codes with a near-complete theory:
The work was decades ahead of its time. The computational requirements made it impractical in 1962, and the field moved on to convolutional codes and algebraic block codes.
1996: David MacKay and Radford Neal independently rediscovered LDPC codes, demonstrating near-Shannon-limit performance comparable to turbo codes.
Key insight: Modern computers could finally run the iterative decoder that Gallager envisioned.
Gallager received the IEEE Medal of Honor (1990) — though primarily for his textbook on information theory, not LDPC codes. The true impact of LDPC wasn't recognized until the 2000s.
A typical algebraic code (Hamming, BCH, RS) has a dense parity-check matrix — roughly half the entries are 1.
An LDPC matrix is sparse — the fraction of 1s goes to zero as the code length grows.
For an (n, k) LDPC code with m = n-k check equations, each row has wr ones and each column has wc ones, where wr, wc << n.
Dense H (Hamming-like):
1 0 1 1 0 1 1 0 1 0 ...
0 1 1 0 1 1 0 1 0 1 ...
Sparse H (LDPC):
1 0 0 1 0 0 0 1 0 0 ...
0 1 0 0 0 1 0 0 0 1 ...
Typical: wc = 3-6 ones per column, wr = 6-30 ones per row, regardless of n.
A Tanner graph is a bipartite graph representation of the parity-check matrix H. Introduced by Michael Tanner (1981) — another early LDPC-related contribution.
Variable nodes (VNs): One per codeword bit (columns of H). Represented as circles.
Check nodes (CNs): One per parity-check equation (rows of H). Represented as squares.
Edge between VNj and CNi if and only if H[i][j] = 1.
v1 v2 v3 v4 v5 v6
c1 [1 1 0 1 0 0]
c2 [0 1 1 0 1 0]
c3 [1 0 0 0 1 1]
↓ edges follow H[i][j]=1 ↓
Every variable node has degree dv (same column weight), every check node has degree dc (same row weight).
Gallager's original construction: (3,6)-regular gives rate 1/2.
Simple but not optimal — there's a gap to capacity.
Variable and check nodes have different degrees, drawn from a degree distribution.
Key insight (Richardson, Shokrollahi, Urbanke, 2001): Optimizing the degree distribution can get within 0.0045 dB of Shannon limit!
Irregular codes perform better but are harder to construct and may have worse minimum distance. The degree distribution optimization is a convex optimization problem.
1. Create a base submatrix with exactly wc=1 per column in the first section
2. Create random column permutations for subsequent sections
3. Stack the sections to form H
Short cycles (especially 4-cycles) in the Tanner graph hurt iterative decoding.
Progressive Edge Growth (PEG) algorithm: add edges one at a time, maximizing local girth.
| Method | Pros | Cons |
|---|---|---|
| Random | Good performance | Encoding complexity |
| PEG | High girth | Slow construction |
| Algebraic | Structured, easy encode | Less flexible |
| QC-LDPC | Hardware-friendly | Slightly suboptimal |
| Protograph | Excellent performance | Complex design |
import numpy as np
from scipy.sparse import csr_matrix
def gallager_ldpc(n, dv, dc):
"""
Construct a regular (dv, dc)-LDPC parity-check matrix
using Gallager's random construction.
n = code length (number of variable nodes)
dv = column weight (variable node degree)
dc = row weight (check node degree)
"""
assert (n * dv) % dc == 0, "n*dv must be divisible by dc"
m = (n * dv) // dc # number of check nodes
rate = 1.0 - dv / dc
# Build base submatrix: m/dv rows, each row has dc consecutive 1s
rows_per_section = m // dv
H = np.zeros((m, n), dtype=int)
# First section: identity-like structure
for i in range(rows_per_section):
for j in range(dc):
H[i, i * dc + j] = 1
# Remaining sections: random column permutations
for section in range(1, dv):
perm = np.random.permutation(n)
for i in range(rows_per_section):
for j in range(dc):
col = perm[i * dc + j]
H[section * rows_per_section + i, col] = 1
print(f"LDPC({n},{n-m}): rate={rate:.3f}, "
f"dv={dv}, dc={dc}, density={dv/m:.4f}")
return csr_matrix(H)
# Example: (3,6)-regular LDPC code, rate 1/2
H = gallager_ldpc(n=1200, dv=3, dc=6)
print(f"H shape: {H.shape}, nnz: {H.nnz}, "
f"density: {H.nnz/(H.shape[0]*H.shape[1]):.4f}")
Unlike turbo or convolutional codes, LDPC codes are defined by H, not G. Encoding requires finding a generator matrix or using efficient alternatives.
Compute G from H using Gaussian elimination:
Problem: G is dense even when H is sparse! O(n2) encoding.
Approximate lower triangular form via careful row/column permutations:
where T is lower triangular. Encoding becomes O(n + g2) where g is a small "gap" parameter.
H is composed of circulant permutation matrices. Encoding reduces to shift-register operations — O(n) complexity with simple hardware. This is why standards use QC-LDPC.
Belief propagation (BP) is an iterative message-passing algorithm on the Tanner graph. Messages flow between variable nodes and check nodes, refining bit estimates.
Each VN sends its current belief (LLR) to connected CNs, excluding the message from that specific CN.
Each CN computes parity constraint satisfaction from connected VNs, excluding the target VN.
1. Initialize: Set variable-to-check messages to channel LLRs
2. Check update: Each check node computes outgoing messages using the tanh rule
3. Variable update: Each variable node sums incoming check messages + channel LLR
4. Tentative decision: Compute total LLR, make hard decision
5. Check syndrome: If H·ĉ = 0, stop. Otherwise go to step 2.
Typically converges in 5-50 iterations.
For tree-like graphs, BP computes exact marginals. For graphs with cycles, it's an approximation — but works remarkably well in practice.
def bp_decode(H, llr_channel, max_iter=50):
m, n = H.shape
# Messages: variable-to-check and check-to-variable
v2c = np.zeros((m, n)) # only nonzero where H[i][j]=1
c2v = np.zeros((m, n))
# Initialize v2c messages with channel LLRs
for i, j in zip(*H.nonzero()):
v2c[i, j] = llr_channel[j]
for iteration in range(max_iter):
# Check node update (tanh rule)
for i in range(m):
vn_indices = H[i].nonzero()[1]
for j in vn_indices:
product = 1.0
for j2 in vn_indices:
if j2 != j:
product *= np.tanh(v2c[i, j2] / 2.0)
c2v[i, j] = 2.0 * np.arctanh(np.clip(product, -0.9999, 0.9999))
# Variable node update
total_llr = llr_channel.copy()
for j in range(n):
cn_indices = H[:, j].nonzero()[0]
total_llr[j] = llr_channel[j] + sum(c2v[i, j] for i in cn_indices)
for i in cn_indices:
v2c[i, j] = total_llr[j] - c2v[i, j]
# Hard decision and syndrome check
decoded = (total_llr < 0).astype(int)
if np.all((H @ decoded) % 2 == 0):
return decoded, iteration + 1
return decoded, max_iter
The tanh/arctanh in sum-product is expensive. Min-sum approximates:
Only needs: signs, absolute values, and minimum — no transcendental functions!
~0.2-0.5 dB loss compared to sum-product, but much simpler hardware.
Normalized min-sum: Multiply result by a correction factor α ≈ 0.75. Recovers ~0.1 dB.
Offset min-sum: Subtract a small constant β from the minimum. Similar improvement.
| Algorithm | Operations | Loss |
|---|---|---|
| Sum-Product | tanh, atanh, ×, + | 0 dB (reference) |
| Min-Sum | min, sign, compare | ~0.3 dB |
| Normalized MS | min, sign, × α | ~0.1 dB |
| Offset MS | min, sign, - β | ~0.1 dB |
Nearly all commercial LDPC decoders use min-sum variants. The sum-product is mainly a theoretical/simulation reference.
Positive LLR = likely 0; Negative LLR = likely 1
Check nodes compute: "Based on the OTHER variables in my parity check, what should this variable be?"
If c1 connects v1, v2, v4 and v2 looks like a 1, then c1 tells v1: "you should be a 1 too" (to satisfy parity).
Messages converge. Variable nodes accumulate evidence from all their check node neighbors, which in turn gather information from distant parts of the code.
Information propagates across the graph — v1 eventually "hears from" all other bits through multi-hop paths.
Density evolution tracks the probability distribution of messages through iterations, assuming the Tanner graph is a tree (valid for large codes).
For a given degree distribution, there exists a channel parameter threshold:
Below threshold: error probability → 0 as iterations → ∞
Above threshold: error probability stays bounded away from 0
For large n, the actual code performance concentrates around the ensemble average predicted by density evolution — a random code is almost surely good.
| Code | Rate | Threshold | Shannon | Gap |
|---|---|---|---|---|
| (3,6)-regular | 1/2 | 1.11 dB | 0.187 dB | 0.92 dB |
| Irregular (opt) | 1/2 | 0.19 dB | 0.187 dB | 0.003 dB! |
| (4,8)-regular | 1/2 | 1.00 dB | 0.187 dB | 0.81 dB |
Optimized irregular LDPC: within 0.0045 dB of capacity for rate 1/2 on the BEC!
| Property | LDPC | Turbo |
|---|---|---|
| Gap to capacity (R=1/2, BER=10-5) | ~0.04 dB | ~0.3 dB |
| Error floor | Lower (10-10+) | Higher (10-7) |
| Decoding parallelism | Highly parallel | Limited (sequential trellis) |
| Decoding latency | Lower per iteration | Higher (full trellis walk) |
| Short block performance | Worse | Better |
| Encoding complexity | Higher (unless QC) | Linear |
For long blocks (n > 10,000) and high throughput, LDPC consistently outperforms turbo codes. This is why 5G chose LDPC for the data channel.
LDPC has been optional since 802.11n and mandatory in 802.11ax (Wi-Fi 6) for higher MCS indices.
| Parameter | Value |
|---|---|
| Code type | QC-LDPC |
| Codeword lengths | 648, 1296, 1944 |
| Rates | 1/2, 2/3, 3/4, 5/6 |
| Sub-matrix size | 27, 54, 81 |
1. ~1.5 dB gain over BCC at high rates
2. QC structure enables parallel decoding
3. Low error floor meets WiFi requirements
4. Scales well to 160/320 MHz channels
802.11be (Wi-Fi 7) requires LDPC for all OFDMA and MU-MIMO modes. BCC (convolutional) is being phased out.
3GPP chose LDPC over turbo codes for 5G NR data channels (PDSCH/PUSCH). This was a contentious decision — the "coding war" of 2016.
| Type | Quasi-cyclic (QC-LDPC) |
| Base graphs | BG1 (large blocks), BG2 (small blocks) |
| Lifting sizes | 2-384 (51 values) |
| Max code length | ~25,000 bits |
| Rates | 1/5 to ~0.95 |
| HARQ | Incremental redundancy |
Throughput: 5G requires 20 Gbps. LDPC's parallel decoder architecture supports this; turbo codes' serial trellis doesn't.
Low floor: 5G data needs BER < 10-5 at BLER < 10-1 — LDPC's low error floor is a natural fit.
Rate flexibility: QC-LDPC with multiple lifting sizes provides fine-grained rate matching.
DVB-S2 (2004) was one of the first major standards to adopt LDPC codes, combined with BCH as an outer code.
| Normal frame | 64,800 bits |
| Short frame | 16,200 bits |
| Rates | 1/4, 1/3, 2/5, 1/2, 3/5, 2/3, 3/4, 4/5, 5/6, 8/9, 9/10 |
DVB-S2 operates within 0.7-1.0 dB of Shannon limit across all rates — a major improvement over DVB-S's RS+convolutional codes.
DVB-S2X (2014) added even more rates and modulation orders, tightening the gap further.
DVB-S2 demonstrated that LDPC codes could be practically deployed in consumer hardware (satellite receivers) — a major milestone.
10GBase-T sends 10 Gbps over Cat 6a/7 cables up to 100 meters. The signal environment is extremely harsh — crosstalk, echo, ISI. LDPC coding is essential.
| Code | (2048, 1723) LDPC |
| Rate | 0.841 |
| Column weight | 6 |
| Row weight | 32 |
| Decoder iterations | 6-12 |
| Throughput | >10 Gbps sustained |
Coding gain: ~6 dB over uncoded. Without this, 10 Gbps over copper would require higher-quality (and more expensive) cables.
The LDPC decoder chip must run at extremely high clock rates — QC structure and min-sum decoding are essential.
A QC-LDPC code's H matrix is composed of circulant permutation matrices (or zero matrices). This structure enables massively parallel hardware decoders.
H is an array of Z×Z circulant blocks:
P is a Z×Z identity matrix cyclically shifted. aij = -1 means zero block.
Z processing units work in parallel
Barrel shifters replace random interconnects
Compact memory: store only shift values
Regular routing = simpler chip layout
Modern QC-LDPC decoders achieve 100+ Gbps in hardware (ASIC/FPGA) — enabling 5G, Wi-Fi 6/7, and 100G Ethernet.
The girth of a Tanner graph is the length of the shortest cycle. Higher girth = better BP performance.
| Girth | Impact |
|---|---|
| 4 | Very bad — correlated messages after 2 iterations |
| 6 | Acceptable — most practical codes |
| 8+ | Good — messages stay nearly independent longer |
4-cycles: if H[i1][j1]=H[i1][j2]=H[i2][j1]=H[i2][j2]=1, there's a 4-cycle. These cause messages to "echo" immediately.
Small substructures in the Tanner graph where BP gets "trapped" — unable to correct a small number of errors.
An (a,b) trapping set: a variable nodes connected to b odd-degree check nodes. Responsible for the error floor.
Post-processing: detect trapping sets, flip bits
Code design: avoid known trapping set patterns
Outer code: BCH/CRC cleans up residual errors
import numpy as np
def build_qc_ldpc(base_matrix, Z):
"""
Construct a QC-LDPC parity check matrix from a base matrix.
base_matrix: 2D array where entry a_ij is a shift value
(-1 means zero block)
Z: lifting factor (circulant block size)
"""
mb, nb = base_matrix.shape
H = np.zeros((mb * Z, nb * Z), dtype=int)
for i in range(mb):
for j in range(nb):
shift = base_matrix[i, j]
if shift >= 0:
# Create Z x Z identity matrix shifted by 'shift'
block = np.roll(np.eye(Z, dtype=int), shift, axis=1)
H[i*Z:(i+1)*Z, j*Z:(j+1)*Z] = block
return H
# Example: Simple rate-1/2 QC-LDPC
# Base matrix (3 x 6, rate = 1 - 3/6 = 1/2)
base = np.array([
[ 0, 1, 3, -1, 2, -1],
[-1, 0, 2, 1, -1, 3],
[ 2, -1, -1, 3, 0, 1]
])
Z = 8 # Lifting factor
H = build_qc_ldpc(base, Z)
print(f"QC-LDPC H: {H.shape[0]} x {H.shape[1]}")
print(f"Rate: {1 - H.shape[0]/H.shape[1]:.3f}")
print(f"Column weights: {H.sum(axis=0)[:Z]}")
print(f"Row weights: {H.sum(axis=1)[:Z]}")
print(f"Density: {H.sum() / H.size:.4f}")
The base matrix is much smaller than H (e.g., 46×68 for 5G NR BG1 vs 17,664×26,112 after lifting with Z=384). Standards only need to specify the base matrix.
| Property | LDPC | Turbo | Polar | RS/BCH |
|---|---|---|---|---|
| Year | 1962/1996 | 1993 | 2009 | 1950s-60s |
| Gap to capacity | <0.1 dB | ~0.3 dB | ~0.3 dB | 2+ dB |
| Error floor | Very low | Moderate | None (SC) | None |
| Decoder parallelism | Excellent | Poor | Moderate | Good |
| Best block size | Long (>1000) | Medium (100-6000) | Medium-Long | Short-Medium |
| Encoding complexity | O(n) QC | O(n) | O(n log n) | O(n) |
| 1962 | Gallager's thesis |
| 1981 | Tanner graphs |
| 1996 | MacKay & Neal rediscovery |
| 2001 | Density evolution & irregular codes |
| 2004 | DVB-S2 standard |
| 2009 | Wi-Fi 802.11n |
| 2016 | 10GBase-T Ethernet |
| 2018 | 5G NR data channel |
Nearest to Shannon limit of any practical code
Massively parallel decoding architecture
Low error floor with proper design
Flexible rate and length via QC structure
Mature, well-understood theory
From a forgotten thesis to the most widely deployed channel code in modern communications — LDPC codes are the ultimate "slow burn" success story of coding theory.