From Shannon's theorem to the mathematics of reliable communication
Telegraph errors, Bell Labs, and the birth of information theory
Noisy channels, capacity, and Shannon's theorems
Block codes, convolutional codes, and their properties
Hamming distance, code rate, minimum distance
Singleton, Hamming, Plotkin, and Gilbert-Varshamov
Timeline of major achievements in coding theory
Since the earliest days of electrical communication, engineers faced a fundamental challenge: noise corrupts transmitted messages.
Morse code operators would request "please repeat" when signals were garbled by line noise, lightning, or crosstalk between wires.
Long-distance calls suffered from static, fading, and signal degradation over copper lines spanning hundreds of miles.
In 1948, Claude Shannon, a mathematician and electrical engineer at Bell Labs, published "A Mathematical Theory of Communication" in the Bell System Technical Journal.
Shannon showed that information could be quantified and that reliable communication was possible even over noisy channels — up to a fundamental limit called channel capacity.
Also at Bell Labs, Richard Hamming was a mathematician who grew frustrated with the unreliability of early computers. Electromechanical relay machines would fail on weekends when no operators were present.
"If the machine can detect an error, why can't it locate the position of the error and correct it?" — Hamming's famous question that launched error-correcting codes.
In 1950, Hamming published his landmark paper introducing Hamming codes — the first practical single-error-correcting codes. He also formalized the concept of Hamming distance.
Coding theory is the study of the properties of codes and their fitness for specific applications. It lies at the intersection of mathematics, computer science, and electrical engineering.
Compresses data to remove redundancy. Goal: represent information with fewer bits.
Examples: Huffman coding, LZW, arithmetic coding
Adds controlled redundancy to detect/correct errors. Goal: reliable communication over noisy channels.
Examples: Hamming codes, Reed-Solomon, LDPC
Encoder: Maps message m of k bits to codeword c of n bits (where n > k)
Channel: Introduces errors — bits may be flipped, erased, or inserted
Decoder: Receives noisy r and attempts to recover original m
Shannon defined channel capacity as the maximum rate at which information can be reliably transmitted over a noisy channel.
where p is the crossover (bit-flip) probability and H(p) is the binary entropy function
For any rate R < C (channel capacity), there exist codes of rate R that achieve arbitrarily low error probability as the block length n → ∞.
Conversely, for any rate R > C, the error probability is bounded away from zero regardless of the code used.
Error-free communication IS possible at rates below capacity. We don't need infinite bandwidth to overcome noise.
Shannon's proof was non-constructive. He proved good codes exist using random coding arguments, but didn't show how to build them.
The BSC is the simplest and most studied channel model. Each bit is independently flipped with probability p.
Channel Diagram
0 —(1-p)—→ 0
↘(p)↙
1 —(1-p)—→ 1
Message is divided into fixed-length blocks of k bits. Each block is independently encoded into n bits.
Message bits are encoded using a sliding window with memory. Output depends on current and past input bits.
Recognizes that an error has occurred, but cannot fix it. Requires retransmission (ARQ).
Locates and corrects errors without retransmission. Essential for one-way channels.
A code C is a set of valid codewords. For a binary code of length n, C ⊆ {0,1}n.
| Message | Codeword |
|---|---|
| 00 | 0000 |
| 01 | 0111 |
| 10 | 1011 |
| 11 | 1100 |
The code rate measures the proportion of information bits in each codeword.
The Hamming distance d(x, y) between two binary strings x and y of equal length is the number of positions at which they differ.
d(10101, 10000) = 2
d(11111, 00000) = 5
d(01010, 01010) = 0
def hamming_distance(x, y):
"""Compute Hamming distance between two bit strings."""
if len(x) != len(y):
raise ValueError("Strings must be equal length")
return sum(a != b for a, b in zip(x, y))
# Examples
print(hamming_distance("10101", "10000")) # 2
print(hamming_distance("11111", "00000")) # 5
The minimum distance dmin of a code C is the smallest Hamming distance between any two distinct codewords:
| dmin | Errors Detected | Errors Corrected | Example |
|---|---|---|---|
| 1 | 0 | 0 | Uncoded |
| 2 | 1 | 0 | Parity check |
| 3 | 2 | 1 | Hamming(7,4) |
| 5 | 4 | 2 | Golay codes |
| 7 | 6 | 3 | BCH codes |
The Hamming weight wt(x) of a binary string x is the number of non-zero positions — equivalently, the distance from x to the all-zeros string.
def hamming_weight(x):
"""Count the number of 1s in a binary string."""
return sum(int(bit) for bit in x)
# For linear codes: d_min = minimum weight of non-zero codewords
def min_distance_linear(codewords):
"""Find minimum distance of a linear code."""
non_zero = [c for c in codewords if any(b == '1' for b in c)]
return min(hamming_weight(c) for c in non_zero)
# Example: simple (4,2) code
code = ['0000', '0111', '1011', '1100']
print(f"Minimum distance: {min_distance_linear(code)}") # 2
import itertools
def code_distance_matrix(codewords):
"""Compute all pairwise Hamming distances in a code."""
n = len(codewords)
matrix = [[0] * n for _ in range(n)]
for i in range(n):
for j in range(i + 1, n):
d = hamming_distance(codewords[i], codewords[j])
matrix[i][j] = d
matrix[j][i] = d
return matrix
def minimum_distance(codewords):
"""Find the minimum distance of a code."""
min_d = float('inf')
for c1, c2 in itertools.combinations(codewords, 2):
d = hamming_distance(c1, c2)
min_d = min(min_d, d)
return min_d
# Example: Hamming(7,4) - some codewords
hamming_74 = ['0000000', '1000110', '0100101', '1100011',
'0010011', '1010101', '0110110', '1110000']
print(f"Minimum distance: {minimum_distance(hamming_74)}") # 3
The Singleton bound (1964) gives the simplest upper bound on minimum distance for any code.
If we delete d − 1 positions from each codeword, the resulting shortened words must still be distinct (otherwise two codewords would agree in all remaining positions and differ in at most d − 1, contradicting dmin).
So 2k ≤ 2n−d+1, giving d ≤ n − k + 1.
Codes that achieve the Singleton bound with equality are called Maximum Distance Separable (MDS) codes.
The Hamming bound (sphere-packing bound) considers how many error patterns of weight ≤ t can be packed into the space {0,1}n.
Each codeword is the center of a "sphere" of radius t in Hamming space. These spheres must not overlap (otherwise decoding is ambiguous). The total volume of all spheres cannot exceed 2n.
Codes achieving the Hamming bound with equality are called perfect codes. The Hamming spheres tile the entire space with no gaps.
The Plotkin bound (1960) is most useful when the minimum distance d is close to the block length n (high-redundancy regime).
The Plotkin bound becomes tighter than the Hamming bound when the code rate is low.
Unlike the previous bounds (which are upper bounds), the Gilbert-Varshamov (GV) bound is a lower bound — it guarantees that good codes exist.
Build the code greedily: add a new codeword if it's at distance ≥ d from all existing codewords. The process can continue as long as the Hamming balls of radius d − 1 don't cover all of {0,1}n.
The GV bound matches the channel capacity in certain asymptotic regimes, suggesting that random linear codes are nearly optimal. Most practical codes fall between the GV and Hamming bounds.
For binary codes of length n, the bounds constrain the relationship between rate R = k/n and relative distance δ = d/n:
| Bound | Type | Key Condition | Tightest When |
|---|---|---|---|
| Singleton | Upper | d ≤ n − k + 1 | Over large alphabets |
| Hamming | Upper | Sphere-packing | Low error correction |
| Plotkin | Upper | Counting argument | δ > 1/2 |
| Gilbert-Varshamov | Lower | Greedy construction | All ranges |
from math import comb, log2
def hamming_bound_max_k(n, t):
"""Max k satisfying the Hamming bound for t-error correction."""
sphere_size = sum(comb(n, i) for i in range(t + 1))
return int(n - log2(sphere_size))
def singleton_bound_max_d(n, k):
"""Max d satisfying the Singleton bound."""
return n - k + 1
# Example: n = 15
print(f"Hamming(15, t=1): max k = {hamming_bound_max_k(15, 1)}") # 11
print(f"Singleton(15, k=11): max d = {singleton_bound_max_d(15, 11)}") # 5
Lecture 02: Parity Checks & Simple Codes
Lecture 03: Hamming Codes
Lecture 04: CRC & Error Detection