Coding Theory Series — 01

Foundations of Coding Theory

From Shannon's theorem to the mathematics of reliable communication

Mathematics History Code

Roadmap

Origins

Telegraph errors, Bell Labs, and the birth of information theory

Core Theory

Noisy channels, capacity, and Shannon's theorems

Code Types

Block codes, convolutional codes, and their properties

Key Metrics

Hamming distance, code rate, minimum distance

Bounds

Singleton, Hamming, Plotkin, and Gilbert-Varshamov

Milestones

Timeline of major achievements in coding theory

The Problem of Noise

History

Since the earliest days of electrical communication, engineers faced a fundamental challenge: noise corrupts transmitted messages.

Telegraph Era (1840s)

Morse code operators would request "please repeat" when signals were garbled by line noise, lightning, or crosstalk between wires.

Telephone Era (1900s)

Long-distance calls suffered from static, fading, and signal degradation over copper lines spanning hundreds of miles.

The core question: Can we add structure to our messages so that errors introduced during transmission can be detected — or even corrected — automatically?

Claude Shannon (1916–2001)

History

The Father of Information Theory

In 1948, Claude Shannon, a mathematician and electrical engineer at Bell Labs, published "A Mathematical Theory of Communication" in the Bell System Technical Journal.

Shannon showed that information could be quantified and that reliable communication was possible even over noisy channels — up to a fundamental limit called channel capacity.

Key Contributions

  • Defined information entropy (bits)
  • Proved the noisy channel coding theorem
  • Established channel capacity as a fundamental limit
  • Showed error-free communication is theoretically possible

Richard Hamming (1915–1998)

History

Also at Bell Labs, Richard Hamming was a mathematician who grew frustrated with the unreliability of early computers. Electromechanical relay machines would fail on weekends when no operators were present.

The Frustration

"If the machine can detect an error, why can't it locate the position of the error and correct it?" — Hamming's famous question that launched error-correcting codes.

The Breakthrough

In 1950, Hamming published his landmark paper introducing Hamming codes — the first practical single-error-correcting codes. He also formalized the concept of Hamming distance.

Shannon proved reliable communication was possible; Hamming showed how to do it.

What Is Coding Theory?

Mathematics

Coding theory is the study of the properties of codes and their fitness for specific applications. It lies at the intersection of mathematics, computer science, and electrical engineering.

Source Coding

Compresses data to remove redundancy. Goal: represent information with fewer bits.

Examples: Huffman coding, LZW, arithmetic coding

Channel Coding

Adds controlled redundancy to detect/correct errors. Goal: reliable communication over noisy channels.

Examples: Hamming codes, Reed-Solomon, LDPC

This series focuses on channel coding — the art and science of adding just enough redundancy to combat errors without wasting bandwidth.

The Noisy Channel Model

Mathematics
Source
Message m
Encoder
m → c
Channel
+ Noise
Decoder
r → m'
Sink
Message m'

Encoder: Maps message m of k bits to codeword c of n bits (where n > k)

Channel: Introduces errors — bits may be flipped, erased, or inserted

Decoder: Receives noisy r and attempts to recover original m

Redundancy = n − k bits   |   Code Rate R = k / n

Channel Capacity

Mathematics

Shannon defined channel capacity as the maximum rate at which information can be reliably transmitted over a noisy channel.

C = maxp(x) I(X; Y)    [bits per channel use]

Binary Symmetric Channel (BSC)

C = 1 − H(p) = 1 + p·log2(p) + (1−p)·log2(1−p)

where p is the crossover (bit-flip) probability and H(p) is the binary entropy function

Example: For a BSC with p = 0.1, capacity C ≈ 0.531 bits per use. You can reliably transmit at any rate R < C, but not at rates above C.

Shannon's Noisy Channel Theorem

Mathematics History

The Theorem (1948)

For any rate R < C (channel capacity), there exist codes of rate R that achieve arbitrarily low error probability as the block length n → ∞.

Conversely, for any rate R > C, the error probability is bounded away from zero regardless of the code used.

The Good News

Error-free communication IS possible at rates below capacity. We don't need infinite bandwidth to overcome noise.

The Catch

Shannon's proof was non-constructive. He proved good codes exist using random coding arguments, but didn't show how to build them.

The search for practical codes that approach channel capacity drove 70+ years of research. LDPC and turbo codes finally achieved this in the 1990s–2000s.

Binary Symmetric Channel (BSC)

Mathematics

The BSC is the simplest and most studied channel model. Each bit is independently flipped with probability p.

Channel Diagram

0 —(1-p)—→ 0
  ↘(p)
1 —(1-p)—→ 1

Properties

  • Memoryless: errors are independent
  • Symmetric: P(0→1) = P(1→0) = p
  • Binary: input and output are {0, 1}
  • When p = 0.5, channel is useless (C = 0)
  • When p = 0, channel is perfect (C = 1)
P(received = r | sent = c) = pd(c,r) · (1−p)n−d(c,r)

Types of Codes

Mathematics

Block Codes

Message is divided into fixed-length blocks of k bits. Each block is independently encoded into n bits.

  • Denoted (n, k) or [n, k, d]
  • Encoding is memoryless between blocks
  • Examples: Hamming, Reed-Solomon, BCH, LDPC
[n, k, d] code over Fq

Convolutional Codes

Message bits are encoded using a sliding window with memory. Output depends on current and past input bits.

  • Described by (n, k, K) where K = constraint length
  • Encoding has memory (state machine)
  • Decoded by Viterbi algorithm (1967)
Rate = k/n, Memory = K−1
This series focuses primarily on block codes, which are more naturally analyzed using algebraic structures (finite fields, polynomials, matrices).

Detection vs. Correction

Mathematics

Error Detection

Recognizes that an error has occurred, but cannot fix it. Requires retransmission (ARQ).

  • Needs minimum distance d ≥ 2
  • Can detect up to d − 1 errors
  • Simpler, less redundancy needed
  • Examples: parity bits, CRC, checksums

Error Correction (FEC)

Locates and corrects errors without retransmission. Essential for one-way channels.

  • Needs minimum distance d ≥ 3
  • Can correct up to ⌊(d−1)/2⌋ errors
  • More redundancy, higher complexity
  • Examples: Hamming, Reed-Solomon, turbo codes
Detection: up to dmin − 1 errors   |   Correction: up to ⌊(dmin − 1) / 2⌋ errors
Tradeoff: A code with dmin = 5 can either detect 4 errors OR correct 2 errors, but not both simultaneously at those limits.

Codewords & Code Structure

Mathematics

A code C is a set of valid codewords. For a binary code of length n, C ⊆ {0,1}n.

Example: Simple (4,2) Code

MessageCodeword
000000
010111
101011
111100

Key Properties

  • |C| = 2k codewords for a binary (n,k) code
  • Rate R = k/n (efficiency)
  • Redundancy = n − k bits per codeword
  • Linear code: sum of any two codewords is also a codeword
A linear code forms a vector subspace of F2n. This algebraic structure enables efficient encoding and decoding.

Code Rate

Mathematics
R = k / n

The code rate measures the proportion of information bits in each codeword.

1/3
Triple repetition code
(3,1) — High redundancy
4/7
Hamming(7,4)
Good balance
11/15
Hamming(15,11)
Higher efficiency
Shannon's theorem: We need R < C for reliable communication. If we set R too high, no code can save us. If R is too low, we waste bandwidth.

Hamming Distance

Mathematics Code

The Hamming distance d(x, y) between two binary strings x and y of equal length is the number of positions at which they differ.

d(x, y) = |{i : xi ≠ yi}| = wt(x ⊕ y)

Examples

d(10101, 10000) = 2
d(11111, 00000) = 5
d(01010, 01010) = 0

Properties (Metric)

  • d(x, y) ≥ 0 (non-negative)
  • d(x, y) = 0 iff x = y
  • d(x, y) = d(y, x) (symmetric)
  • d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality)
def hamming_distance(x, y):
    """Compute Hamming distance between two bit strings."""
    if len(x) != len(y):
        raise ValueError("Strings must be equal length")
    return sum(a != b for a, b in zip(x, y))

# Examples
print(hamming_distance("10101", "10000"))  # 2
print(hamming_distance("11111", "00000"))  # 5

Minimum Distance

Mathematics

The minimum distance dmin of a code C is the smallest Hamming distance between any two distinct codewords:

dmin(C) = min { d(ci, cj) : ci, cj ∈ C, ci ≠ cj }
For linear codes, dmin equals the minimum Hamming weight of any non-zero codeword:
dmin = min { wt(c) : c ∈ C, c ≠ 0 }

Why Minimum Distance Matters

dminErrors DetectedErrors CorrectedExample
100Uncoded
210Parity check
321Hamming(7,4)
542Golay codes
763BCH codes

Hamming Weight

Mathematics Code

The Hamming weight wt(x) of a binary string x is the number of non-zero positions — equivalently, the distance from x to the all-zeros string.

wt(x) = d(x, 0) = |{i : xi ≠ 0}|
def hamming_weight(x):
    """Count the number of 1s in a binary string."""
    return sum(int(bit) for bit in x)

# For linear codes: d_min = minimum weight of non-zero codewords
def min_distance_linear(codewords):
    """Find minimum distance of a linear code."""
    non_zero = [c for c in codewords if any(b == '1' for b in c)]
    return min(hamming_weight(c) for c in non_zero)

# Example: simple (4,2) code
code = ['0000', '0111', '1011', '1100']
print(f"Minimum distance: {min_distance_linear(code)}")  # 2
For linear codes, computing dmin requires checking only 2k − 1 codewords (minimum weight) instead of all C(2k, 2) pairs.

Python: Distance Matrix

Code
import itertools

def code_distance_matrix(codewords):
    """Compute all pairwise Hamming distances in a code."""
    n = len(codewords)
    matrix = [[0] * n for _ in range(n)]
    for i in range(n):
        for j in range(i + 1, n):
            d = hamming_distance(codewords[i], codewords[j])
            matrix[i][j] = d
            matrix[j][i] = d
    return matrix

def minimum_distance(codewords):
    """Find the minimum distance of a code."""
    min_d = float('inf')
    for c1, c2 in itertools.combinations(codewords, 2):
        d = hamming_distance(c1, c2)
        min_d = min(min_d, d)
    return min_d

# Example: Hamming(7,4) - some codewords
hamming_74 = ['0000000', '1000110', '0100101', '1100011',
              '0010011', '1010101', '0110110', '1110000']
print(f"Minimum distance: {minimum_distance(hamming_74)}")  # 3
The full Hamming(7,4) code has 24 = 16 codewords, all at minimum distance 3 from each other. This enables single-error correction.

The Singleton Bound

Mathematics

The Singleton bound (1964) gives the simplest upper bound on minimum distance for any code.

dmin ≤ n − k + 1

Intuition

If we delete d − 1 positions from each codeword, the resulting shortened words must still be distinct (otherwise two codewords would agree in all remaining positions and differ in at most d − 1, contradicting dmin).

So 2k ≤ 2n−d+1, giving d ≤ n − k + 1.

MDS Codes

Codes that achieve the Singleton bound with equality are called Maximum Distance Separable (MDS) codes.

  • Reed-Solomon codes are MDS
  • Optimal error correction for given rate
  • Only exist over large enough alphabets
Example: For a [15, 11, d] code, Singleton says d ≤ 15 − 11 + 1 = 5.

The Hamming Bound

Mathematics

The Hamming bound (sphere-packing bound) considers how many error patterns of weight ≤ t can be packed into the space {0,1}n.

2k · ∑i=0t C(n, i) ≤ 2n    where t = ⌊(d−1)/2⌋

Interpretation

Each codeword is the center of a "sphere" of radius t in Hamming space. These spheres must not overlap (otherwise decoding is ambiguous). The total volume of all spheres cannot exceed 2n.

Perfect Codes

Codes achieving the Hamming bound with equality are called perfect codes. The Hamming spheres tile the entire space with no gaps.

  • Hamming codes (single-error correcting)
  • Binary Golay code [23, 12, 7]
  • Trivial: repetition codes, whole-space codes

The Plotkin Bound

Mathematics

The Plotkin bound (1960) is most useful when the minimum distance d is close to the block length n (high-redundancy regime).

If d is even and d > n/2:   |C| ≤ 2 · ⌊d / (2d − n)⌋

Simplified Form (for linear codes)

If 2d > n:   k ≤ n − 2d + 1 + log2(2d)

The Plotkin bound becomes tighter than the Hamming bound when the code rate is low.

Key insight: No binary code can have dmin > n/2 with more than 2n codewords. The Plotkin bound quantifies how quickly the code size shrinks as d approaches n.

Gilbert-Varshamov Bound

Mathematics

Unlike the previous bounds (which are upper bounds), the Gilbert-Varshamov (GV) bound is a lower bound — it guarantees that good codes exist.

A binary linear [n, k, d] code exists if:   2n−k > ∑i=0d−2 C(n−1, i)

Greedy Argument

Build the code greedily: add a new codeword if it's at distance ≥ d from all existing codewords. The process can continue as long as the Hamming balls of radius d − 1 don't cover all of {0,1}n.

Significance

The GV bound matches the channel capacity in certain asymptotic regimes, suggesting that random linear codes are nearly optimal. Most practical codes fall between the GV and Hamming bounds.

The GV bound is existential: it proves good codes exist but doesn't help construct them explicitly.

Comparing the Bounds

Mathematics

For binary codes of length n, the bounds constrain the relationship between rate R = k/n and relative distance δ = d/n:

BoundTypeKey ConditionTightest When
SingletonUpperd ≤ n − k + 1Over large alphabets
HammingUpperSphere-packingLow error correction
PlotkinUpperCounting argumentδ > 1/2
Gilbert-VarshamovLowerGreedy constructionAll ranges
from math import comb, log2

def hamming_bound_max_k(n, t):
    """Max k satisfying the Hamming bound for t-error correction."""
    sphere_size = sum(comb(n, i) for i in range(t + 1))
    return int(n - log2(sphere_size))

def singleton_bound_max_d(n, k):
    """Max d satisfying the Singleton bound."""
    return n - k + 1

# Example: n = 15
print(f"Hamming(15, t=1): max k = {hamming_bound_max_k(15, 1)}")  # 11
print(f"Singleton(15, k=11): max d = {singleton_bound_max_d(15, 11)}")  # 5

Timeline of Coding Theory

History
1948 — Shannon publishes "A Mathematical Theory of Communication"
1950 — Hamming publishes error-correcting codes; Golay discovers the [23,12,7] code
1954 — Muller and Reed develop Reed-Muller codes
1959 — Hocquenghem, Bose, and Chaudhuri develop BCH codes
1960 — Reed and Solomon invent Reed-Solomon codes; Peterson's CRC paper
1962 — Gallager invents LDPC codes (forgotten for 30 years!)
1967 — Viterbi algorithm for decoding convolutional codes
1993 — Berrou et al. invent turbo codes (near Shannon limit!)
1996 — MacKay & Neal rediscover LDPC codes
2009 — Arıkan invents polar codes (provably achieve capacity)

Summary & What's Next

Key Takeaways

  • Shannon's theorem proves reliable communication is possible at rates below channel capacity
  • Hamming distance is the fundamental metric for code analysis
  • Minimum distance determines detection and correction capability
  • The major bounds constrain what codes can and cannot achieve
  • Linear codes have rich algebraic structure enabling efficient algorithms

Coming Up

Lecture 02: Parity Checks & Simple Codes

Lecture 03: Hamming Codes

Lecture 04: CRC & Error Detection

The central question of coding theory: How do we design codes that approach the Shannon limit with practical encoding and decoding complexity?