Coding Theory Series — 05

Linear Block Codes

The algebraic backbone of error correction — generator matrices, syndrome decoding, and the geometry of codewords

Mathematics Code

Use arrow keys or swipe to navigate • ESC for overview

Roadmap

Foundations

Vector spaces over GF(2), linear code definition, [n,k,d] notation

Matrices

Generator matrix G, parity-check matrix H, systematic form

Encoding

Matrix multiplication over GF(2), systematic encoding

Decoding

Syndrome decoding, standard arrays, coset leaders, ML decoding

Theory

Dual codes, weight distributions, MacWilliams identity

Bounds & Examples

Singleton, Hamming, Plotkin bounds; classic linear codes

Vector Spaces over GF(2)

Mathematics

The Binary Field GF(2)

GF(2) = {0, 1} with arithmetic modulo 2:

0 + 0 = 0 0 + 1 = 1 1 + 1 = 0
0 × 0 = 0 0 × 1 = 0 1 × 1 = 1

Addition = XOR, Multiplication = AND

Note: subtraction is the same as addition in GF(2), since -1 = 1

GF(2)ⁿ as a Vector Space

The set of all binary n-tuples forms a vector space over GF(2):

Dimension: n
Size: 2ⁿ vectors
Addition: componentwise XOR
Scalar mult: trivial (0 or 1)

Example: GF(2)³ = {000, 001, 010, 011, 100, 101, 110, 111}

Subspaces & Linear Codes

Mathematics

A Linear Code is a Subspace

A linear code C is a k-dimensional subspace of GF(2)ⁿ.

This means C is closed under:

Vector addition: If c₁, c₂ ∈ C, then c₁ + c₂ ∈ C

Scalar multiplication: If c ∈ C and a ∈ GF(2), then a·c ∈ C

Key consequence: The zero vector 0 is always a codeword in a linear code. The sum of any two codewords is also a codeword.

Why linearity matters: We only need to store k basis vectors (the rows of G) instead of all 2^k codewords!

The [n, k, d] Code

Mathematics

Block length
Codeword length in bits

Dimension
Message length in bits

Min distance
Min weight of nonzero codeword

Code rate: R = k/n Redundancy: r = n - k |C| = 2^k

For linear codes: d_min = min{wt(c) : c ∈ C, c ≠ 0}. The minimum distance equals the minimum weight because wt(c₁ + c₂) = d(c₁, c₂).

Error correction capability: t = ⌊(d-1)/2⌋ Error detection: d-1 errors

Generator Matrix G

Mathematics

Definition

The generator matrix G is a k × n matrix whose rows form a basis for the code C.

c = m · G

where m is a 1 × k message vector and c is a 1 × n codeword.

Any k linearly independent codewords can serve as rows of G.

Systematic Form

G is in systematic form when:

G = [ I_k | P ]

where I_k is the k×k identity and P is a k×(n-k) parity matrix.

Codeword = [message | parity]

Any linear code can be put in systematic form via row reduction (Gaussian elimination over GF(2)).

Generator Matrix Example

Mathematics Code

Hamming [7, 4, 3] Code

G =

1	0	0	0	1	1	0
0	1	0	0	0	1	1
0	0	1	0	1	1	1
0	0	0	1	1	0	1

Example encoding: Message m = [1 0 1 1]

c = m \cdot G = [1 0 1 1] \cdot G = [1 0 1 1 | 1 0 0]

First 4 bits = message (systematic), last 3 bits = parity

The code has 2⁴ = 16 codewords, can correct 1 error (t = ⌊(3-1)/2⌋ = 1).

Parity-Check Matrix H

Mathematics

Definition & Relationship to G

The parity-check matrix H is an (n-k) × n matrix such that:

C = { c ∈ GF(2)ⁿ : H · c^T = 0 }

A vector v is a valid codeword if and only if H · v^T = 0.

Fundamental Relationship

G · H^T = 0

The rows of G are orthogonal to the rows of H

From Systematic G

If G = [I_k | P], then:

H = [ -P^T | I_n-k ]

In GF(2), -P^T = P^T

H and Minimum Distance

Mathematics

Connecting H to d_min

Theorem: The minimum distance d of a linear code equals the minimum number of linearly dependent columns of H.

Proof sketch: If c is a codeword of weight w, then Hc^T = 0, which means w columns of H sum to zero.

Hamming Code H Matrix

H =

1	0	1	1	1	0	0
1	1	1	0	0	1	0
0	1	1	1	0	0	1

Every column is distinct and nonzero. Any 2 columns are linearly independent, but there exist 3 dependent columns. Therefore d = 3.

Encoding Process

Code

Message
m (1×k)

→

Multiply
c = m · G

→

Codeword
c (1×n)

→

Channel
+ noise

→

Received
r = c + e

import numpy as np

def encode_linear(message, G):
    """Encode message using generator matrix over GF(2)."""
    return np.dot(message, G) % 2

# Hamming [7,4,3] generator matrix (systematic)
G = np.array([
    [1, 0, 0, 0, 1, 1, 0],
    [0, 1, 0, 0, 0, 1, 1],
    [0, 0, 1, 0, 1, 1, 1],
    [0, 0, 0, 1, 1, 0, 1]
])

# Encode all 16 messages
for i in range(16):
    m = np.array([(i >> 3)&1, (i >> 2)&1, (i >> 1)&1, i&1])
    c = encode_linear(m, G)
    print(f"m={m} -> c={c}, weight={sum(c)}")

Syndrome Decoding

Mathematics

The Syndrome

When we receive r = c + e (codeword + error), compute:

s = H · r^T = H · (c + e)^T = H · c^T + H · e^T = H · e^T

The syndrome depends only on the error pattern, not on the transmitted codeword!

s = 0

Either no errors occurred, or the error pattern is itself a codeword (undetectable).

s ≠ 0

Errors detected! The syndrome identifies the coset containing the error pattern.

For single-error correction: The syndrome s equals the column of H corresponding to the error position!

Syndrome Decoding: Step by Step

Mathematics Code

Worked Example — Hamming [7,4,3]

Sent: c = [1 0 1 1 1 0 0]

Error: e = [0 0 0 0 0 1 0] (bit 5 flipped)

Received: r = [1 0 1 1 1 1 0]

Step 1: Compute syndrome s = H · r^T

s = [1, 1, 0] T

Step 2: Look up syndrome in table — matches column 5 of H

Step 3: Flip bit 5: ĉ = r + e₅ = [1 0 1 1 1 0 0]

Step 4: Extract message: m̂ = [1 0 1 1] ✓

Standard Array & Coset Leaders

Mathematics

The standard array partitions GF(2)ⁿ into cosets of the code C:

Coset Leader	Coset Members (leader + each codeword)
0000000	c₁	c₂	...	c₁₆
0000001	c₁+e	c₂+e	...	c₁₆+e
0000010	...	...	...	...
... (2^n-k = 2³ = 8 cosets total)

Coset leader = the minimum weight vector in each coset. Syndrome decoding maps syndrome → coset leader → error correction.

Storage cost: For large codes, the standard array has 2ⁿ entries — impractical! Use syndrome lookup table (only 2^n-k entries).

Syndrome Lookup Table

Code

Hamming [7,4,3] Syndrome Table

Syndrome s	Error Pattern e	Error Position
[0 0 0]	0000000	No error
[1 0 0]	0001000	Position 3
[0 1 0]	0100000	Position 1
[0 0 1]	0000001	Position 6
[1 1 0]	1000000	Position 0
[0 1 1]	0010000	Position 2
[1 0 1]	0000010	Position 4
[1 1 1]	0000100	Position 5

Only 2^n-k = 8 entries needed vs. 2ⁿ = 128 for full standard array

Dual Codes

Mathematics

Definition

The dual code C^⊥ of an [n, k] code C is:

C^⊥ = { v ∈ GF(2)ⁿ : v · c^T = 0 for all c ∈ C }

Properties

C^⊥ is an [n, n-k] code
G of C = H of C^⊥
H of C = G of C^⊥
(C^⊥)^⊥ = C

Self-Dual Codes

A code is self-dual if C = C^⊥.

Requires n = 2k (rate = 1/2).

Example: Extended [8,4,4] Hamming code is self-dual.

Self-dual codes have elegant mathematical properties and connections to lattices.

Weight Distribution

Mathematics

Weight Enumerator

The weight enumerator polynomial of code C:

W_C(x, y) = Σ_c∈C x^n-wt(c) · y^wt(c) = Σ_i=0ⁿ A_i · x^n-i · yⁱ

where A_i = number of codewords of weight i.

Example: Hamming [7,4,3]

Weight	0	1	2	3	4	5	6	7
Count A_i	1	0	0	7	7	0	0	1

W(x,y) = x⁷ + 7x⁴y³ + 7x³y⁴ + y⁷

Total: 1 + 7 + 7 + 1 = 16 = 2⁴ ✓

MacWilliams Identity

Mathematics

Connecting C and C^⊥

MacWilliams Identity (1963): The weight distribution of C^⊥ is determined by the weight distribution of C:

W_C^⊥(x, y) = (1/|C|) · W_C(x + y, x - y)

Significance

Knowing the weight distribution of C gives you the weight distribution of C^⊥ for free
Powerful tool for proving properties of codes
Constrains what weight distributions are possible

Applications

Computing undetected error probability
Bounding code performance
Proving non-existence of certain codes

Maximum Likelihood Decoding

Mathematics

Optimal but Expensive

Given received vector r, find the codeword ĉ that maximizes:

ĉ = argmax_c∈C P(r | c was sent)

For BSC with error probability p < 1/2, this is equivalent to:

ĉ = argmin_c∈C d(r, c) (minimum distance decoding)

Complexity: Requires comparing r against all 2^k codewords. For k = 100, that's 2¹⁰⁰ ≈ 10³⁰ comparisons — completely infeasible!

Syndrome decoding is ML decoding when the coset leader is the most probable error pattern — which is guaranteed for BSC when t ≤ ⌊(d-1)/2⌋ errors.

Singleton Bound

Mathematics

The Simplest Upper Bound

d ≤ n - k + 1

Proof: Delete any d-1 coordinate positions from all codewords. The resulting shortened words have length n-d+1. Since they must all be distinct (otherwise two codewords would differ in fewer than d positions), we need:

2 k \leq 2 n-d+1 ⟹ k \leq n - d + 1 ⟹ d \leq n - k + 1

MDS Codes: Codes achieving d = n - k + 1 are called Maximum Distance Separable. Reed-Solomon codes are MDS! Binary MDS codes are trivial (repetition, SPC, or trivial).

Hamming Bound (Sphere-Packing)

Mathematics

Σ_i=0^t C(n, i) ≤ 2^n-k

where t = ⌊(d-1)/2⌋ and C(n,i) = "n choose i".

Geometric Intuition

Place a sphere of radius t around each of the 2^k codewords. These spheres must not overlap. The total volume of all spheres cannot exceed the total space 2ⁿ.

Perfect Codes

Codes achieving equality are perfect codes. The spheres tile the entire space!

Hamming codes: [2^r-1, 2^r-r-1, 3]
Golay code: [23, 12, 7]
Trivial: repetition codes, GF(2)ⁿ

Proved in 1973 (Tietäväinen, van Lint): these are the ONLY binary perfect codes!

Plotkin Bound

Mathematics

If d is even and 2d > n: |C| ≤ 2d / (2d - n)

If d is odd: |C| ≤ 2(d+1) / (2d + 1 - n) when 2d + 1 > n

Proof Idea (Double Counting)

Count the total Hamming distance summed over all pairs:

S = Σ i\neqj d(c i, c j) \geq |C|(|C|-1) \cdot d

By counting column-by-column, we can show S ≤ |C|² · n/2.

Combining: |C|(|C|-1)d ≤ |C|²n/2 → solve for |C|.

Plotkin bound is tightest when d ≈ n/2 (high redundancy regime). For codes with d much smaller than n, Hamming bound is tighter.

Classic Linear Codes

Mathematics

Repetition Code

[n, 1, n]

G = [1 1 ... 1]

Just two codewords: all-zeros and all-ones.

Rate = 1/n → 0 as n grows. Maximum error correction for given n.

Single Parity Check

[n, n-1, 2]

H = [1 1 ... 1]

Append one parity bit. Detects all single errors but corrects none.

Rate = (n-1)/n → 1. Dual of repetition code!

Hamming Code

[2^r-1, 2^r-r-1, 3]

H has all nonzero r-bit columns.

The unique perfect 1-error correcting code family.

Examples: [7,4,3], [15,11,3], [31,26,3]

The repetition code and SPC code are duals of each other. The Hamming code is the dual of the simplex code.

Python: Linear Code Toolkit

Code

import numpy as np

class LinearCode:
    def __init__(self, G):
        """Initialize linear code from generator matrix G over GF(2)."""
        self.G = np.array(G) % 2
        self.k, self.n = self.G.shape
        self.H = self._compute_parity_check()

    def _compute_parity_check(self):
        """Compute H from systematic G = [I_k | P] -> H = [P^T | I_{n-k}]."""
        P = self.G[:, self.k:]       # extract parity part
        r = self.n - self.k
        return np.hstack([P.T, np.eye(r, dtype=int)]) % 2

    def encode(self, message):
        """Encode k-bit message to n-bit codeword."""
        return np.dot(message, self.G) % 2

    def syndrome(self, received):
        """Compute syndrome s = H @ r^T."""
        return np.dot(self.H, received) % 2

    def decode(self, received):
        """Syndrome decode: correct single errors."""
        s = self.syndrome(received)
        if not np.any(s):
            return received[:self.k]  # no error
        # Find matching column in H
        for i in range(self.n):
            if np.array_equal(s, self.H[:, i]):
                corrected = received.copy()
                corrected[i] ^= 1
                return corrected[:self.k]
        return None  # uncorrectable

Python: Encode & Decode Demo

Code

# Create Hamming [7,4,3] code
G = [[1,0,0,0, 1,1,0],
     [0,1,0,0, 0,1,1],
     [0,0,1,0, 1,1,1],
     [0,0,0,1, 1,0,1]]

hamming = LinearCode(G)

# Encode
msg = np.array([1, 0, 1, 1])
codeword = hamming.encode(msg)
print(f"Message:  {msg}")
print(f"Codeword: {codeword}")  # [1 0 1 1 1 0 0]

# Simulate single-bit error on position 2
received = codeword.copy()
received[2] ^= 1  # flip bit 2
print(f"Received: {received}")  # [1 0 0 1 1 0 0]

# Decode
s = hamming.syndrome(received)
print(f"Syndrome: {s}")  # [1 1 1] -> column 2 of H

decoded = hamming.decode(received)
print(f"Decoded:  {decoded}")   # [1 0 1 1] ✓

# Verify: test all single-error patterns
errors_corrected = 0
for msg_int in range(16):
    m = np.array([(msg_int>>i)&1 for i in range(3,-1,-1)])
    c = hamming.encode(m)
    for pos in range(7):
        r = c.copy()
        r[pos] ^= 1
        d = hamming.decode(r)
        if np.array_equal(d, m):
            errors_corrected += 1
print(f"Corrected {errors_corrected}/112 single errors")  # 112/112

Summary

Core Concepts

Linear codes are subspaces of GF(2)ⁿ
Generator matrix G encodes, parity-check H decodes
G · H^T = 0 links them
Syndrome s = H · r^T depends only on error

Key Results

d_min = min dependent columns of H
MacWilliams identity connects C and C^⊥
Singleton, Hamming, Plotkin bound d
Perfect codes: Hamming, Golay, trivial — that's all!

Next up: Convolutional codes — moving beyond block codes to memory-based encoding with the powerful Viterbi decoding algorithm.

End of Part 05 • Linear Block Codes

Linear Block Codes

Roadmap

Foundations

Matrices

Encoding

Decoding

Theory

Bounds & Examples

Vector Spaces over GF(2)

The Binary Field GF(2)

GF(2)n as a Vector Space

Subspaces & Linear Codes

A Linear Code is a Subspace

The [n, k, d] Code

Generator Matrix G

Definition

Systematic Form

Generator Matrix Example

Hamming [7, 4, 3] Code

Parity-Check Matrix H

Definition & Relationship to G

Fundamental Relationship

From Systematic G

H and Minimum Distance

Connecting H to dmin

Hamming Code H Matrix

Encoding Process

Syndrome Decoding

The Syndrome

s = 0

s ≠ 0

Syndrome Decoding: Step by Step

Worked Example — Hamming [7,4,3]

Standard Array & Coset Leaders

Syndrome Lookup Table

Hamming [7,4,3] Syndrome Table

Dual Codes

Definition

Properties

Self-Dual Codes

Weight Distribution

Weight Enumerator

Example: Hamming [7,4,3]

MacWilliams Identity

Connecting C and C⊥

Significance

Applications

Maximum Likelihood Decoding

Optimal but Expensive

Singleton Bound

The Simplest Upper Bound

Hamming Bound (Sphere-Packing)

Geometric Intuition

Perfect Codes

Plotkin Bound

Proof Idea (Double Counting)

Classic Linear Codes

Repetition Code

Single Parity Check

Hamming Code

Python: Linear Code Toolkit

Python: Encode & Decode Demo

Summary

Core Concepts

Key Results

GF(2)ⁿ as a Vector Space

Connecting H to d_min

Connecting C and C^⊥