Coding Theory Series — 01

Foundations of Coding Theory

From Shannon's theorem to the mathematics of reliable communication

Mathematics History Code

Use arrow keys or swipe to navigate • Press ESC for overview

Roadmap

Origins

Telegraph errors, Bell Labs, and the birth of information theory

Core Theory

Noisy channels, capacity, and Shannon's theorems

Code Types

Block codes, convolutional codes, and their properties

Key Metrics

Hamming distance, code rate, minimum distance

Bounds

Singleton, Hamming, Plotkin, and Gilbert-Varshamov

Milestones

Timeline of major achievements in coding theory

The Problem of Noise

History

Since the earliest days of electrical communication, engineers faced a fundamental challenge: noise corrupts transmitted messages.

Telegraph Era (1840s)

Morse code operators would request "please repeat" when signals were garbled by line noise, lightning, or crosstalk between wires.

Telephone Era (1900s)

Long-distance calls suffered from static, fading, and signal degradation over copper lines spanning hundreds of miles.

The core question: Can we add structure to our messages so that errors introduced during transmission can be detected — or even corrected — automatically?

Claude Shannon (1916–2001)

History

The Father of Information Theory

In 1948, Claude Shannon, a mathematician and electrical engineer at Bell Labs, published "A Mathematical Theory of Communication" in the Bell System Technical Journal.

Shannon showed that information could be quantified and that reliable communication was possible even over noisy channels — up to a fundamental limit called channel capacity.

Key Contributions

Defined information entropy (bits)
Proved the noisy channel coding theorem
Established channel capacity as a fundamental limit
Showed error-free communication is theoretically possible

Richard Hamming (1915–1998)

History

Also at Bell Labs, Richard Hamming was a mathematician who grew frustrated with the unreliability of early computers. Electromechanical relay machines would fail on weekends when no operators were present.

The Frustration

"If the machine can detect an error, why can't it locate the position of the error and correct it?" — Hamming's famous question that launched error-correcting codes.

The Breakthrough

In 1950, Hamming published his landmark paper introducing Hamming codes — the first practical single-error-correcting codes. He also formalized the concept of Hamming distance.

Shannon proved reliable communication was possible; Hamming showed how to do it.

What Is Coding Theory?

Mathematics

Coding theory is the study of the properties of codes and their fitness for specific applications. It lies at the intersection of mathematics, computer science, and electrical engineering.

Source Coding

Compresses data to remove redundancy. Goal: represent information with fewer bits.

Examples: Huffman coding, LZW, arithmetic coding

Channel Coding

Adds controlled redundancy to detect/correct errors. Goal: reliable communication over noisy channels.

Examples: Hamming codes, Reed-Solomon, LDPC

This series focuses on channel coding — the art and science of adding just enough redundancy to combat errors without wasting bandwidth.

The Noisy Channel Model

Mathematics

Source
Message m

→

Encoder
m → c

→

Channel
+ Noise

→

Decoder
r → m'

→

Sink
Message m'

Encoder: Maps message m of k bits to codeword c of n bits (where n > k)

Channel: Introduces errors — bits may be flipped, erased, or inserted

Decoder: Receives noisy r and attempts to recover original m

Redundancy = n − k bits | Code Rate R = k / n

Channel Capacity

Mathematics

Shannon defined channel capacity as the maximum rate at which information can be reliably transmitted over a noisy channel.

C = max_p(x) I(X; Y) [bits per channel use]

Binary Symmetric Channel (BSC)

C = 1 − H(p) = 1 + p·log₂(p) + (1−p)·log₂(1−p)

where p is the crossover (bit-flip) probability and H(p) is the binary entropy function

Example: For a BSC with p = 0.1, capacity C ≈ 0.531 bits per use. You can reliably transmit at any rate R < C, but not at rates above C.

Shannon's Noisy Channel Theorem

Mathematics History

The Theorem (1948)

For any rate R < C (channel capacity), there exist codes of rate R that achieve arbitrarily low error probability as the block length n → ∞.

Conversely, for any rate R > C, the error probability is bounded away from zero regardless of the code used.

The Good News

Error-free communication IS possible at rates below capacity. We don't need infinite bandwidth to overcome noise.

The Catch

Shannon's proof was non-constructive. He proved good codes exist using random coding arguments, but didn't show how to build them.

The search for practical codes that approach channel capacity drove 70+ years of research. LDPC and turbo codes finally achieved this in the 1990s–2000s.

Binary Symmetric Channel (BSC)

Mathematics

The BSC is the simplest and most studied channel model. Each bit is independently flipped with probability p.

Channel Diagram

0 —(1-p)—→ 0
↘(p)↙
1 —(1-p)—→ 1

Properties

Memoryless: errors are independent
Symmetric: P(0→1) = P(1→0) = p
Binary: input and output are {0, 1}
When p = 0.5, channel is useless (C = 0)
When p = 0, channel is perfect (C = 1)

P(received = r | sent = c) = p^d(c,r) · (1−p)^n−d(c,r)

Types of Codes

Mathematics

Block Codes

Message is divided into fixed-length blocks of k bits. Each block is independently encoded into n bits.

Denoted (n, k) or [n, k, d]
Encoding is memoryless between blocks
Examples: Hamming, Reed-Solomon, BCH, LDPC

[n, k, d] code over F_q

Convolutional Codes

Message bits are encoded using a sliding window with memory. Output depends on current and past input bits.

Described by (n, k, K) where K = constraint length
Encoding has memory (state machine)
Decoded by Viterbi algorithm (1967)

Rate = k/n, Memory = K−1

This series focuses primarily on block codes, which are more naturally analyzed using algebraic structures (finite fields, polynomials, matrices).

Detection vs. Correction

Mathematics

Error Detection

Recognizes that an error has occurred, but cannot fix it. Requires retransmission (ARQ).

Needs minimum distance d ≥ 2
Can detect up to d − 1 errors
Simpler, less redundancy needed
Examples: parity bits, CRC, checksums

Error Correction (FEC)

Locates and corrects errors without retransmission. Essential for one-way channels.

Needs minimum distance d ≥ 3
Can correct up to ⌊(d−1)/2⌋ errors
More redundancy, higher complexity
Examples: Hamming, Reed-Solomon, turbo codes

Detection: up to d_min − 1 errors | Correction: up to ⌊(d_min − 1) / 2⌋ errors

Tradeoff: A code with d_min = 5 can either detect 4 errors OR correct 2 errors, but not both simultaneously at those limits.

Codewords & Code Structure

Mathematics

A code C is a set of valid codewords. For a binary code of length n, C ⊆ {0,1}ⁿ.

Example: Simple (4,2) Code

Message	Codeword
00	0000
01	0111
10	1011
11	1100

Key Properties

|C| = 2^k codewords for a binary (n,k) code
Rate R = k/n (efficiency)
Redundancy = n − k bits per codeword
Linear code: sum of any two codewords is also a codeword

A linear code forms a vector subspace of F₂ⁿ. This algebraic structure enables efficient encoding and decoding.

Code Rate

Mathematics

R = k / n

The code rate measures the proportion of information bits in each codeword.

1/3

Triple repetition code
(3,1) — High redundancy

4/7

Hamming(7,4)
Good balance

11/15

Hamming(15,11)
Higher efficiency

Shannon's theorem: We need R < C for reliable communication. If we set R too high, no code can save us. If R is too low, we waste bandwidth.

Hamming Distance

Mathematics Code

The Hamming distance d(x, y) between two binary strings x and y of equal length is the number of positions at which they differ.

d(x, y) = |{i : x_i ≠ y_i}| = wt(x ⊕ y)

Examples

d(10101, 10000) = 2
d(11111, 00000) = 5
d(01010, 01010) = 0

Properties (Metric)

d(x, y) ≥ 0 (non-negative)
d(x, y) = 0 iff x = y
d(x, y) = d(y, x) (symmetric)
d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality)

def hamming_distance(x, y):
    """Compute Hamming distance between two bit strings."""
    if len(x) != len(y):
        raise ValueError("Strings must be equal length")
    return sum(a != b for a, b in zip(x, y))

# Examples
print(hamming_distance("10101", "10000"))  # 2
print(hamming_distance("11111", "00000"))  # 5

Minimum Distance

Mathematics

The minimum distance d_min of a code C is the smallest Hamming distance between any two distinct codewords:

d_min(C) = min { d(c_i, c_j) : c_i, c_j ∈ C, c_i ≠ c_j }

For linear codes, d_min equals the minimum Hamming weight of any non-zero codeword:
d_min = min { wt(c) : c ∈ C, c ≠ 0 }

Why Minimum Distance Matters

d_min	Errors Detected	Errors Corrected	Example
1	0	0	Uncoded
2	1	0	Parity check
3	2	1	Hamming(7,4)
5	4	2	Golay codes
7	6	3	BCH codes

Hamming Weight

Mathematics Code

The Hamming weight wt(x) of a binary string x is the number of non-zero positions — equivalently, the distance from x to the all-zeros string.

wt(x) = d(x, 0) = |{i : x_i ≠ 0}|

def hamming_weight(x):
    """Count the number of 1s in a binary string."""
    return sum(int(bit) for bit in x)

# For linear codes: d_min = minimum weight of non-zero codewords
def min_distance_linear(codewords):
    """Find minimum distance of a linear code."""
    non_zero = [c for c in codewords if any(b == '1' for b in c)]
    return min(hamming_weight(c) for c in non_zero)

# Example: simple (4,2) code
code = ['0000', '0111', '1011', '1100']
print(f"Minimum distance: {min_distance_linear(code)}")  # 2

For linear codes, computing d_min requires checking only 2^k − 1 codewords (minimum weight) instead of all C(2^k, 2) pairs.

Python: Distance Matrix

Code

import itertools

def code_distance_matrix(codewords):
    """Compute all pairwise Hamming distances in a code."""
    n = len(codewords)
    matrix = [[0] * n for _ in range(n)]
    for i in range(n):
        for j in range(i + 1, n):
            d = hamming_distance(codewords[i], codewords[j])
            matrix[i][j] = d
            matrix[j][i] = d
    return matrix

def minimum_distance(codewords):
    """Find the minimum distance of a code."""
    min_d = float('inf')
    for c1, c2 in itertools.combinations(codewords, 2):
        d = hamming_distance(c1, c2)
        min_d = min(min_d, d)
    return min_d

# Example: Hamming(7,4) - some codewords
hamming_74 = ['0000000', '1000110', '0100101', '1100011',
              '0010011', '1010101', '0110110', '1110000']
print(f"Minimum distance: {minimum_distance(hamming_74)}")  # 3

The full Hamming(7,4) code has 2⁴ = 16 codewords, all at minimum distance 3 from each other. This enables single-error correction.

The Singleton Bound

Mathematics

The Singleton bound (1964) gives the simplest upper bound on minimum distance for any code.

d_min ≤ n − k + 1

Intuition

If we delete d − 1 positions from each codeword, the resulting shortened words must still be distinct (otherwise two codewords would agree in all remaining positions and differ in at most d − 1, contradicting d_min).

So 2^k ≤ 2^n−d+1, giving d ≤ n − k + 1.

MDS Codes

Codes that achieve the Singleton bound with equality are called Maximum Distance Separable (MDS) codes.

Reed-Solomon codes are MDS
Optimal error correction for given rate
Only exist over large enough alphabets

Example: For a [15, 11, d] code, Singleton says d ≤ 15 − 11 + 1 = 5.

The Hamming Bound

Mathematics

The Hamming bound (sphere-packing bound) considers how many error patterns of weight ≤ t can be packed into the space {0,1}ⁿ.

2^k · ∑_i=0^t C(n, i) ≤ 2ⁿ where t = ⌊(d−1)/2⌋

Interpretation

Each codeword is the center of a "sphere" of radius t in Hamming space. These spheres must not overlap (otherwise decoding is ambiguous). The total volume of all spheres cannot exceed 2ⁿ.

Perfect Codes

Codes achieving the Hamming bound with equality are called perfect codes. The Hamming spheres tile the entire space with no gaps.

Hamming codes (single-error correcting)
Binary Golay code [23, 12, 7]
Trivial: repetition codes, whole-space codes

The Plotkin Bound

Mathematics

The Plotkin bound (1960) is most useful when the minimum distance d is close to the block length n (high-redundancy regime).

If d is even and d > n/2: |C| ≤ 2 · ⌊d / (2d − n)⌋

Simplified Form (for linear codes)

If 2d > n: k ≤ n − 2d + 1 + log₂(2d)

The Plotkin bound becomes tighter than the Hamming bound when the code rate is low.

Key insight: No binary code can have d_min > n/2 with more than 2n codewords. The Plotkin bound quantifies how quickly the code size shrinks as d approaches n.

Gilbert-Varshamov Bound

Mathematics

Unlike the previous bounds (which are upper bounds), the Gilbert-Varshamov (GV) bound is a lower bound — it guarantees that good codes exist.

A binary linear [n, k, d] code exists if: 2^n−k > ∑_i=0^d−2 C(n−1, i)

Greedy Argument

Build the code greedily: add a new codeword if it's at distance ≥ d from all existing codewords. The process can continue as long as the Hamming balls of radius d − 1 don't cover all of {0,1}ⁿ.

Significance

The GV bound matches the channel capacity in certain asymptotic regimes, suggesting that random linear codes are nearly optimal. Most practical codes fall between the GV and Hamming bounds.

The GV bound is existential: it proves good codes exist but doesn't help construct them explicitly.

Comparing the Bounds

Mathematics

For binary codes of length n, the bounds constrain the relationship between rate R = k/n and relative distance δ = d/n:

Bound	Type	Key Condition	Tightest When
Singleton	Upper	d ≤ n − k + 1	Over large alphabets
Hamming	Upper	Sphere-packing	Low error correction
Plotkin	Upper	Counting argument	δ > 1/2
Gilbert-Varshamov	Lower	Greedy construction	All ranges

from math import comb, log2

def hamming_bound_max_k(n, t):
    """Max k satisfying the Hamming bound for t-error correction."""
    sphere_size = sum(comb(n, i) for i in range(t + 1))
    return int(n - log2(sphere_size))

def singleton_bound_max_d(n, k):
    """Max d satisfying the Singleton bound."""
    return n - k + 1

# Example: n = 15
print(f"Hamming(15, t=1): max k = {hamming_bound_max_k(15, 1)}")  # 11
print(f"Singleton(15, k=11): max d = {singleton_bound_max_d(15, 11)}")  # 5

Timeline of Coding Theory

History

1948 — Shannon publishes "A Mathematical Theory of Communication"

1950 — Hamming publishes error-correcting codes; Golay discovers the [23,12,7] code

1954 — Muller and Reed develop Reed-Muller codes

1959 — Hocquenghem, Bose, and Chaudhuri develop BCH codes

1960 — Reed and Solomon invent Reed-Solomon codes; Peterson's CRC paper

1962 — Gallager invents LDPC codes (forgotten for 30 years!)

1967 — Viterbi algorithm for decoding convolutional codes

1993 — Berrou et al. invent turbo codes (near Shannon limit!)

1996 — MacKay & Neal rediscover LDPC codes

2009 — Arıkan invents polar codes (provably achieve capacity)

Summary & What's Next

Key Takeaways

Shannon's theorem proves reliable communication is possible at rates below channel capacity
Hamming distance is the fundamental metric for code analysis
Minimum distance determines detection and correction capability
The major bounds constrain what codes can and cannot achieve
Linear codes have rich algebraic structure enabling efficient algorithms

Coming Up

Lecture 02: Parity Checks & Simple Codes

Lecture 03: Hamming Codes

Lecture 04: CRC & Error Detection

The central question of coding theory: How do we design codes that approach the Shannon limit with practical encoding and decoding complexity?