Coding Theory Series — 08

Reed-Solomon Codes

The most widely deployed error correction code in history — from deep space to your pocket

Mathematics Code History Applications Standards

Roadmap

Origins

Reed & Solomon at MIT Lincoln Lab, 1960

Theory

GF(2m) arithmetic, MDS property, code parameters

Encoding

Generator polynomial, systematic encoding, worked examples

Decoding

Full pipeline: syndromes, BM, Forney, Chien

Applications

CDs, QR codes, deep space, RAID 6

Implementation

Python RS encoder/decoder over GF(28)

Historical Origins

History

Irving S. Reed (1923-2012)

Born in Seattle. Caltech PhD, worked at MIT Lincoln Lab. Also co-invented the Reed-Muller codes (1954).

Later, Reed contributed to JPEG image compression and automatic target recognition.

Gustave Solomon (1930-1996)

At MIT Lincoln Lab and JPL. The 1960 paper "Polynomial Codes over Certain Finite Fields" was just 3 pages — one of the most impactful short papers in engineering history.

The original encoding was based on polynomial evaluation, not the generator polynomial approach used today.

The irony: Reed and Solomon's original decoding method was impractical. It took years for Berlekamp (1968), Massey (1969), and Forney (1965) to develop efficient decoders that made RS codes usable.

RS Codes as Non-Binary BCH

Mathematics

The Key Distinction

Reed-Solomon codes are BCH codes where the alphabet and the root field are the same.

Binary BCH

Codeword symbols: GF(2)

Roots of g(x): GF(2m)

Different fields!

Minimal polynomial degree can be up to m, so g(x) degree grows fast.

Reed-Solomon

Codeword symbols: GF(2m)

Roots of g(x): GF(2m)

Same field!

Every root has minimal polynomial of degree 1: (x - αi). So g(x) = product of linear factors.

Consequence: RS codes achieve d = n - k + 1, the Singleton bound. They are Maximum Distance Separable (MDS) — no code can do better for given n and k!

Finite Field Arithmetic: GF(28)

Mathematics

The Standard Field for RS Codes

GF(28) = GF(256): each element is a byte (8 bits).

Irreducible polynomial: p(x) = x8 + x4 + x3 + x2 + 1   (0x11D)

This is the polynomial used in AES, QR codes, and many RS implementations.

Addition

XOR of bytes — trivial and fast:

0x53 + 0xCA = 0x99
01010011 ⊕ 11001010 = 10011001

No carries, no overflow. a + a = 0 for all a.

Multiplication

Polynomial multiplication mod p(x):

a · b = antilog(log(a) + log(b))

Using log/antilog tables makes multiplication a table lookup + addition mod 255.

GF(28) Worked Examples

Mathematics Code

Generating the Log/Antilog Tables

Start with α = 0x02 (the primitive element x), repeatedly multiply:

PowerValue (hex)Value (binary)
α0 = 10x0100000001
α10x0200000010
α20x0400000100
α70x8010000000
α80x1D00011101

α8 = α4 + α3 + α2 + 1 = 0x1D (from the reduction: x8 mod p(x)).

All 255 nonzero elements of GF(256) are generated as powers of α. The log table maps value → exponent, antilog maps exponent → value. Both are 256-byte lookup tables.

RS Code Parameters

Mathematics

RS(n, k) over GF(q)

n
Code length
n = q - 1 symbols
k
Message length
k symbols
n-k+1
Min distance (MDS!)
d = n - k + 1
Error correction: t = ⌊(d-1)/2⌋ = ⌊(n-k)/2⌋ symbol errors

Common RS Codes

CodeFieldParityCorrectionUse
RS(255, 223)GF(28)3216 symbolsDeep space
RS(255, 239)GF(28)168 symbolsDVB
RS(255, 249)GF(28)63 symbolsDiskStorage
RS(204, 188)GF(28)168 symbolsDVB-T
RS(28, 24)GF(28)42 symbolsCD (C1)

The MDS Property

Mathematics

Maximum Distance Separable

dmin = n - k + 1    (Singleton bound met with EQUALITY)

No code with the same n and k can have a larger minimum distance. RS codes are optimal in this sense.

What MDS Means Practically

  • Every n-k+1 symbol positions carry independent information
  • Any k of n received symbols suffice to recover the message
  • Maximum possible erasure correction: n-k erasures
  • Maximum error correction: ⌊(n-k)/2⌋ errors

Trade-off Formula

With s erasures and e errors:

2e + s ≤ n - k = d - 1

Each error "costs" 2 (unknown position + unknown value). Each erasure "costs" 1 (known position, unknown value).

Generator Polynomial

Mathematics

Construction

The generator polynomial for RS(n, k) has 2t = n-k consecutive roots:

g(x) = Πi=12t (x - αi) = (x - α)(x - α2)...(x - α2t)

Each factor is linear (degree 1) since roots are in the same field as coefficients!

Example: RS(255, 223) — 2t = 32

g(x) = (x - α)(x - α2)(x - α3)...(x - α32)

A degree-32 polynomial over GF(256). Coefficients are bytes.

First few coefficients: g(x) = x32 + 0x10·x31 + 0x77·x30 + ...

Why consecutive roots? The BCH bound guarantees d ≥ 2t + 1. Since each root contributes exactly degree 1, we get deg(g) = 2t = n - k, which gives d = n - k + 1 (MDS).

Encoding: Systematic Method

Mathematics Code

Polynomial Long Division Approach

Step 1: Represent message as polynomial m(x) of degree k-1

Step 2: Shift message: x2t · m(x)

Step 3: Compute remainder: r(x) = [x2t · m(x)] mod g(x)

Step 4: Codeword: c(x) = x2t · m(x) - r(x)

c(x) = x2t · m(x) + r(x)     (since -1 = 1 in GF(2m))
Systematic codeword: [parity2t ... parity1 | msgk ... msg1]
The k message symbols appear unchanged in the codeword — easy to extract after decoding.
Alternative — evaluation encoding: Reed and Solomon's original method evaluates m(x) at n points. Equivalent code, but not systematic by default.

Step-by-Step Encoding: RS(7, 3) over GF(23)

Mathematics

Small Example (GF(8) for clarity)

GF(8): p(x) = x3 + x + 1, α3 = α + 1

RS(7, 3): t = 2, g(x) = (x-α)(x-α2)(x-α3)(x-α4)

Message: m = [α2, α0, α4] → m(x) = α2x2 + x + α4

Step 1: x4·m(x) = α2x6 + x5 + α4x4

Step 2: Divide by g(x) using GF(8) polynomial long division

Step 3: Remainder r(x) = r3x3 + r2x2 + r1x + r0

Step 4: Codeword c(x) = x4·m(x) + r(x)

c = [r0, r1, r2, r3, α4, 1, α2]
|--- parity ---| |-- message --|
Verification: c(αi) = 0 for i = 1,2,3,4. Each evaluation involves GF(8) arithmetic.

RS Decoding Pipeline

Mathematics
r(x)
received
Syndromes
S₁...S₂ₜ
BM Algorithm
σ(x)
Chien Search
locations
Forney
values
Correct
ĉ(x)

Key Difference from Binary BCH

Binary BCH only needs error positions (flip the bit). RS needs both positions and values — what is the correct symbol value?

This is why RS decoding adds Forney's algorithm.

Computational Cost

StepComplexity
SyndromesO(n · 2t)
Berlekamp-MasseyO(t2)
Chien searchO(n · t)
ForneyO(t2)

Syndrome Computation

Mathematics

Evaluating r(x) at Roots of g(x)

Sj = r(αj) = Σi=0n-1 ri · (αj)i,    j = 1, 2, ..., 2t

Since c(αj) = 0 for valid codewords, the syndrome depends only on errors:

Sj = e(αj) = Σl=1v Yl · Xlj

Notation

  • v = number of errors
  • Xl = αil = error locator (position)
  • Yl = error value (magnitude) at position il

All Sj = 0?

If all 2t syndromes are zero, the received word is a valid codeword — no errors detected (or an undetectable error pattern, which requires ≥ d errors).

Berlekamp-Massey Algorithm

Mathematics

Finding the Error-Locator Polynomial σ(x)

Iteratively builds σ(x) = 1 + σ1x + σ2x2 + ... + σvxv whose roots are Xl-1.

def berlekamp_massey(syndromes, t, gf):
    """Berlekamp-Massey algorithm over GF(2^m)."""
    n = 2 * t
    sigma = [1]        # error locator polynomial
    old_sigma = [1]    # previous sigma
    L = 0              # current LFSR length

    for r in range(n):
        # Compute discrepancy
        delta = syndromes[r]
        for j in range(1, len(sigma)):
            delta = gf.add(delta, gf.mul(sigma[j], syndromes[r - j]))

        # Shift old_sigma: multiply by x
        old_sigma = [0] + old_sigma

        if delta == 0:
            continue

        # Update sigma
        new_sigma = list(sigma)
        for j in range(len(old_sigma)):
            if j < len(new_sigma):
                new_sigma[j] = gf.add(new_sigma[j],
                                       gf.mul(delta, old_sigma[j]))
            else:
                new_sigma.append(gf.mul(delta, old_sigma[j]))

        if 2 * L <= r:
            # Length change
            old_sigma = [gf.mul(gf.inv(delta), s) for s in sigma]
            # Pad to account for the shift we already did
            old_sigma = [0] + old_sigma
            L = r + 1 - L

        sigma = new_sigma

    return sigma

BM Algorithm: Worked Example

Mathematics

RS(7, 3) over GF(8), 2 errors

Suppose syndromes: S₁ = α3, S₂ = α0, S₃ = α5, S₄ = α2

Iteration rΔσ(x)LAction
0α31 + α3x1Length update
1α61 + α5x + α6x22Length update
20unchanged2No change
30unchanged2No change
Result: σ(x) = 1 + σ₁x + σ₂x2 has degree 2, confirming v = 2 errors. Now find roots via Chien search to locate error positions.

Forney's Algorithm

Mathematics

Computing Error Values

Once we know the error positions Xl, we need the error magnitudes Yl.

Error Evaluator Polynomial

Ω(x) = S(x) · σ(x) mod x2t

where S(x) = S₁ + S₂x + ... + S₂ₜx2t-1 is the syndrome polynomial.

Forney's Formula

Yl = - Ω(Xl-1) / σ'(Xl-1)

σ'(x) is the formal derivative of σ(x).

In characteristic 2: σ'(x) = σ₁ + σ₃x2 + σ₅x4 + ... (only odd-index coefficients survive).

Elegance: Forney's algorithm avoids solving a linear system for the error values. It computes each Yl independently using only polynomial evaluation.

Chien Search

Mathematics Code

Finding Error Locations

Evaluate σ(x) at x = α0, α-1, α-2, ..., α-(n-1):

If σ(α-i) = 0, then position i has an error
def chien_search(sigma, n, gf):
    """Find roots of sigma(x) by exhaustive evaluation."""
    error_positions = []
    for i in range(n):
        # Evaluate sigma at alpha^(-i)
        x_inv = gf.exp_table[(255 - i) % 255]  # alpha^(-i)
        val = 0
        x_power = 1
        for coeff in sigma:
            val = gf.add(val, gf.mul(coeff, x_power))
            x_power = gf.mul(x_power, x_inv)
        if val == 0:
            error_positions.append(i)

    if len(error_positions) != len(sigma) - 1:
        return None  # Decoding failure!
    return error_positions
Decoding failure: If the number of roots found doesn't match deg(σ), the received word has more errors than the code can correct. Report failure — don't output incorrect data!

Erasure & Error+Erasure Decoding

Mathematics

Erasure vs. Error

Error

Unknown position, unknown value.

Costs 2 in the budget: 2e ≤ d-1

Decoder must find both WHERE and WHAT.

Erasure

Known position, unknown value.

Costs 1 in the budget: s ≤ d-1

Position is flagged by demodulator. Only need to find WHAT.

Combined:   2e + s ≤ n - k    (can correct e errors AND s erasures simultaneously)

Why Erasures Matter

  • CD players know which samples are unreliable (scratches, fingerprints)
  • Demodulators can flag low-confidence symbols
  • Interleaving converts burst errors into erasures
  • RS(255, 223) can correct 16 errors OR 32 erasures OR any combination with 2e+s ≤ 32

RS(255, 223) in Deep Space

Applications History

The Standard Deep Space Code

223
Data bytes per block
32
Parity bytes (16 symbol correction)

Concatenated Coding Scheme

Data
RS(255,223)
outer code
Interleave
depth 5
Conv K=7
inner code
Channel
deep space

CCSDS standard adopted by NASA, ESA, JAXA. Used from Voyager to Mars missions.

Performance: Within ~2 dB of Shannon limit. The inner convolutional code corrects most errors; the outer RS code catches the rare bursts that slip through.

CD/DVD: CIRC

Applications Standards

Cross-Interleaved Reed-Solomon Coding

The Compact Disc (1982) uses a brilliant two-layer RS scheme designed by Philips and Sony:

Audio
samples
C2 Encoder
RS(28,24)
Interleave
28 delays
C1 Encoder
RS(32,28)
Disc

C1 (Inner Code)

RS(32, 28) over GF(256)

Corrects 2 byte errors per frame

Handles random errors from disc read

C2 (Outer Code)

RS(28, 24) over GF(256)

Corrects 2 byte errors or 4 erasures

Handles burst errors (scratches up to ~2.4mm)

Result: A CD can handle burst errors up to ~4000 consecutive bits (~2.4mm scratch) and still play perfectly. Up to ~7.7mm with concealment!

QR Codes & Reed-Solomon

Applications Standards

Error Correction Levels

LevelRecoveryUse Case
L (Low)~7%Clean environments
M (Medium)~15%General purpose
Q (Quartile)~25%Industrial
H (High)~30%Harsh environments

How It Works

QR data is encoded as bytes (GF(256)). RS code adds parity bytes. Higher EC level = more parity = less data capacity but more resilience.

Artistic QR codes: Level H allows 30% of the code to be damaged/covered — which is why you can put logos in the center of QR codes and they still scan!

QR codes use RS over GF(256) with p(x) = x8 + x4 + x3 + x2 + 1 — the same field as our examples.

Reed-Solomon in RAID 6

Applications Standards

Surviving Two Disk Failures

RAID 6 uses an RS-like code across disk drives:

RAID 5: Single Parity

P = D₁ ⊕ D₂ ⊕ ... ⊕ Dn

Survives 1 disk failure. Simple XOR.

RAID 6: Dual Parity

P = D₁ ⊕ D₂ ⊕ ... ⊕ Dn

Q = g₁·D₁ ⊕ g₂·D₂ ⊕ ... ⊕ gn·Dn

Where gi = αi in GF(256)!

This is exactly an RS code! Each "symbol" is a disk stripe. The P and Q are two RS parity symbols. When any two disks fail, we have two erasures — RS decoding recovers both.

Scale: Modern storage arrays with 20+ disks rely on RAID 6 (or triple-parity RAID) to avoid data loss. The probability of simultaneous failures during rebuild is non-negligible with large drives.

Python: GF(28) Arithmetic

Code
class GF256:
    """Galois Field GF(2^8) with primitive polynomial x^8+x^4+x^3+x^2+1."""
    def __init__(self):
        self.exp_table = [0] * 512  # double-sized for convenience
        self.log_table = [0] * 256
        # Generate tables using primitive element alpha = 2
        x = 1
        for i in range(255):
            self.exp_table[i] = x
            self.log_table[x] = i
            x <<= 1  # multiply by alpha
            if x & 0x100:  # if x >= 256, reduce mod p(x)
                x ^= 0x11D  # p(x) = x^8 + x^4 + x^3 + x^2 + 1
        # Extend exp table for easy mod-free multiplication
        for i in range(255, 512):
            self.exp_table[i] = self.exp_table[i - 255]

    def add(self, a, b):
        return a ^ b  # addition = XOR

    def sub(self, a, b):
        return a ^ b  # same as addition in char 2

    def mul(self, a, b):
        if a == 0 or b == 0: return 0
        return self.exp_table[self.log_table[a] + self.log_table[b]]

    def div(self, a, b):
        if b == 0: raise ZeroDivisionError
        if a == 0: return 0
        return self.exp_table[(self.log_table[a] - self.log_table[b]) % 255]

    def inv(self, a):
        if a == 0: raise ZeroDivisionError
        return self.exp_table[255 - self.log_table[a]]

    def pow(self, a, n):
        if a == 0: return 0
        return self.exp_table[(self.log_table[a] * n) % 255]

Python: RS Encoder

Code
class ReedSolomon:
    """Reed-Solomon encoder/decoder over GF(256)."""
    def __init__(self, n=255, k=223):
        self.n = n
        self.k = k
        self.t = (n - k) // 2  # error correction capability
        self.gf = GF256()
        self.generator = self._build_generator()

    def _build_generator(self):
        """Build generator polynomial g(x) = prod(x - alpha^i)."""
        g = [1]
        for i in range(1, 2 * self.t + 1):
            # Multiply g by (x - alpha^i)
            alpha_i = self.gf.exp_table[i]
            new_g = [0] * (len(g) + 1)
            for j in range(len(g)):
                new_g[j] = self.gf.add(new_g[j], g[j])
                new_g[j + 1] = self.gf.add(new_g[j + 1],
                                             self.gf.mul(g[j], alpha_i))
            g = new_g
        return g

    def encode(self, message):
        """Systematic RS encoding."""
        assert len(message) == self.k
        # Multiply message by x^(n-k)
        msg_padded = message + [0] * (self.n - self.k)
        # Compute remainder of msg_padded / generator
        remainder = list(msg_padded)
        for i in range(self.k):
            if remainder[i] != 0:
                coef = remainder[i]
                for j in range(len(self.generator)):
                    remainder[i + j] = self.gf.add(
                        remainder[i + j],
                        self.gf.mul(self.generator[j], coef))
        # Codeword = message + parity
        parity = remainder[self.k:]
        return message + parity

Python: RS Decoder

Code
def decode(self, received):
    """Decode received RS codeword, correct up to t errors."""
    # Step 1: Compute syndromes
    syndromes = []
    for i in range(1, 2 * self.t + 1):
        s = 0
        for j in range(self.n):
            s = self.gf.add(s, self.gf.mul(received[j],
                            self.gf.pow(self.gf.exp_table[i], j)))
        syndromes.append(s)

    if all(s == 0 for s in syndromes):
        return received[:self.k]  # No errors

    # Step 2: Berlekamp-Massey -> error locator sigma(x)
    sigma = self._berlekamp_massey(syndromes)

    # Step 3: Chien search -> error positions
    positions = self._chien_search(sigma)
    if positions is None:
        raise ValueError("Decoding failure: too many errors")

    # Step 4: Forney's algorithm -> error values
    omega = self._error_evaluator(syndromes, sigma)
    sigma_deriv = self._formal_derivative(sigma)

    # Step 5: Compute error values and correct
    corrected = list(received)
    for pos in positions:
        x_inv = self.gf.inv(self.gf.exp_table[pos])
        # Evaluate omega and sigma' at x_inv
        omega_val = self._poly_eval(omega, x_inv)
        sigma_d_val = self._poly_eval(sigma_deriv, x_inv)
        # Error value
        error_val = self.gf.mul(self.gf.exp_table[pos],
                                self.gf.div(omega_val, sigma_d_val))
        corrected[pos] = self.gf.add(corrected[pos], error_val)

    return corrected[:self.k]

Python: RS Encode/Decode Demo

Code
# Create RS(255, 223) codec
rs = ReedSolomon(n=255, k=223)

# Create a test message (223 bytes)
message = list(range(223))  # [0, 1, 2, ..., 222]

# Encode
codeword = rs.encode(message)
print(f"Message length:  {len(message)}")
print(f"Codeword length: {len(codeword)}")
print(f"Parity bytes:    {len(codeword) - len(message)}")

# Introduce 16 random symbol errors (maximum correctable)
import random
received = list(codeword)
error_positions = random.sample(range(255), 16)
for pos in error_positions:
    received[pos] ^= random.randint(1, 255)  # corrupt byte

errors = sum(a != b for a, b in zip(codeword, received))
print(f"\nErrors introduced: {errors}")

# Decode
try:
    decoded = rs.decode(received)
    if decoded == message:
        print("Decoding SUCCESS - all 16 errors corrected!")
    else:
        print("Decoding produced wrong result")
except ValueError as e:
    print(f"Decoding failed: {e}")

# Test with 17 errors (beyond correction capability)
received2 = list(codeword)
for pos in random.sample(range(255), 17):
    received2[pos] ^= random.randint(1, 255)
try:
    rs.decode(received2)
    print("WARNING: 17 errors decoded (miscorrection!)")
except ValueError:
    print("17 errors correctly detected as uncorrectable")

RS Performance & Comparison

Mathematics Applications

Why RS Codes Dominate

PropertyRS CodesBinary BCHConvolutional
Symbol size m-bit symbols Individual bits Individual bits
Burst error handling Excellent Poor Moderate
MDS? Yes! No N/A
Random errors Good Better per bit Best (soft)
Decoding complexity O(n·t + t2) O(n·t + t2) O(2K·n)
RS excels at burst errors: A burst of 8m consecutive bit errors corrupts at most 9 symbols — RS corrects this easily. Binary codes would need to correct 8m individual bit errors!

Why RS is "The Most Widely Deployed Code"

Applications Standards
1982

Compact Disc
First mass RS deployment

1997

DVD
Enhanced CIRC

2000s

QR Codes
Mobile revolution

Reasons for Ubiquity

  • MDS optimality: best possible distance
  • Burst error resilience: handles real-world noise
  • Flexible parameters: adjustable n, k, t
  • Mature decoders: 50+ years of optimization
  • Hardware friendly: GF(28) = byte operations

Where You'll Find RS Today

  • Every CD, DVD, Blu-ray ever made
  • Every QR code ever scanned
  • Every digital TV broadcast (DVB)
  • Deep space communications (NASA/ESA)
  • RAID storage systems
  • DSL/ADSL broadband
  • Barcodes (DataMatrix, PDF417)
  • Flash memory controllers

Summary

Theory

  • Non-binary BCH over GF(2m)
  • MDS: d = n - k + 1 (optimal!)
  • Corrects t = (n-k)/2 symbol errors
  • Or any combination: 2e + s ≤ n - k

Decoding Pipeline

  • Syndromes → Berlekamp-Massey → Chien → Forney
  • Total: O(n·t + t2) — efficient!
  • Handles errors AND erasures
  • Detects uncorrectable patterns

The RS Legacy

1960
Invented
1968
BM decoder
1982
CD launch
Today
Everywhere
Reed-Solomon codes demonstrate how elegant mathematics becomes indispensable engineering. A 3-page paper from 1960 now protects trillions of bytes of data every second, worldwide.