Coding Theory Series — 02

Parity Checks & Simple Codes

The simplest error-detecting and error-correcting schemes

Mathematics Code History Applications

Roadmap

History

Early telegraph and telephone error checking methods

Parity Bits

Even parity, odd parity, and single parity check codes

Repetition

Triple repetition codes and majority voting

2D Parity

Rectangular codes for stronger error detection

Real-World

ISBN check digits and the Luhn algorithm

Tradeoffs

Rate vs. error correction capability

Historical Context

History

Error checking is as old as communication itself. Long before Shannon's formal theory, engineers devised ad-hoc methods to detect transmission errors.

Telegraph (1840s–1900s)

  • Operators used "check sums" — adding up character values and transmitting the total
  • Baudot code (1870) included error-checking conventions
  • Vernam cipher (1917) used XOR — same operation underlying parity

Early Computing (1940s)

  • Punched card machines used parity rows for verification
  • Bell Labs relay computers checked parity after each operation
  • Hamming's frustration: machines could detect errors but not fix them
The parity bit is arguably the oldest and most ubiquitous error-detection mechanism in digital systems.

The Parity Bit

Mathematics

A parity bit is a single bit appended to a data word to make the total number of 1-bits either even or odd.

Even Parity

The parity bit is set so the total number of 1s (data + parity) is even.

1011001 → 10110010 (four 1s)
1010001 → 10100011 (four 1s)

Odd Parity

The parity bit is set so the total number of 1s is odd.

1011001 → 10110011 (five 1s)
1010001 → 10100010 (three 1s)

Even parity: p = d1 ⊕ d2 ⊕ d3 ⊕ ... ⊕ dk
Parity is simply the XOR of all data bits. This operation is both computationally trivial and implementable with a single XOR gate chain.

Single Parity Check Code

Mathematics

The single parity check (SPC) code is a [n, n−1, 2] code. It appends one parity bit to n−1 data bits.

Properties

  • Rate: R = (n−1)/n (very high)
  • dmin = 2: detects 1 error
  • Cannot correct any errors
  • All codewords have even weight

Example: [8, 7, 2]

Data (7 bits)ParityCodeword
0000000000000000
1010101010101010
1100110011001100
1111111111111111
Critical limitation: SPC cannot detect an even number of errors. If 2 bits flip, parity still checks out and the error goes undetected!

Python: Parity Encoding

Code
def encode_even_parity(data: str) -> str:
    """Append an even parity bit to a binary string."""
    parity = sum(int(b) for b in data) % 2
    return data + str(parity)

def check_even_parity(codeword: str) -> bool:
    """Check if a codeword has valid even parity."""
    return sum(int(b) for b in codeword) % 2 == 0

# Encoding examples
print(encode_even_parity("1011001"))  # "10110010" (4 ones, parity=0)
print(encode_even_parity("1010001"))  # "10100011" (4 ones, parity=1)

# Detection examples
print(check_even_parity("10110010"))  # True  - no error
print(check_even_parity("10110011"))  # False - 1 bit flipped!
print(check_even_parity("10110001"))  # True  - 2 bits flipped, UNDETECTED!
Line 11: When 2 bits flip simultaneously, parity cannot detect the error. This is the fundamental weakness of single parity check codes.

When Parity Fails

Mathematics Interactive

Single Error — Detected

Sent:     1 0 1 1 0 0 1 0   (parity = 0, sum = 4)
Received: 1 0 0 1 0 0 1 0   (parity check: sum = 3 ≠ even) ERROR!

Double Error — Undetected!

Sent:     1 0 1 1 0 0 1 0   (parity = 0, sum = 4)
Received: 1 0 0 1 0 1 1 0   (parity check: sum = 4 = even) Looks OK!

Triple Error — Detected

Sent:     1 0 1 1 0 0 1 0   (parity = 0, sum = 4)
Received: 0 0 0 1 0 1 1 0   (parity check: sum = 3 ≠ even) ERROR!

Pattern: SPC detects odd numbers of errors but misses even numbers. In general, it detects all error patterns of weight 1, 3, 5, ... but misses all patterns of weight 2, 4, 6, ...

Repetition Codes

Mathematics

The simplest error-correcting code: repeat each bit n times. The resulting [n, 1, n] code can correct up to ⌊(n−1)/2⌋ errors.

[3, 1, 3]

Triple repetition

0 → 000
1 → 111

Corrects 1 error

[5, 1, 5]

Quintuple repetition

0 → 00000
1 → 11111

Corrects 2 errors

[7, 1, 7]

Septuple repetition

0 → 0000000
1 → 1111111

Corrects 3 errors

Terrible rate: R = 1/n. The triple repetition code wastes 2/3 of bandwidth! This is the brute-force approach — effective but extremely inefficient.

Majority Voting Decoder

Mathematics Code

For repetition codes, decoding is simple: take the majority vote of the received bits.

Triple Repetition Examples

Received 000 → decode as 0 (unanimous)
Received 001 → decode as 0 (2 vs 1)
Received 010 → decode as 0 (2 vs 1)
Received 011 → decode as 1 (1 vs 2)
Received 111 → decode as 1 (unanimous)

Error Analysis

For BSC with crossover probability p:

Perror = 3p2(1−p) + p3 = 3p2 − 2p3

For p = 0.01: Perror ≈ 0.000298 (much better than raw p!)

def repetition_encode(bit: str, n: int = 3) -> str:
    """Encode a single bit using n-fold repetition."""
    return bit * n

def majority_vote_decode(received: str) -> str:
    """Decode using majority voting."""
    ones = received.count('1')
    return '1' if ones > len(received) // 2 else '0'

# Encode and simulate error
encoded = repetition_encode('1', 3)  # "111"
received = "110"  # one bit flipped
decoded = majority_vote_decode(received)  # "1" - correct!

Repetition Code Performance

Mathematics Code

How does error probability decrease as we increase repetition factor n?

from math import comb

def repetition_error_prob(p, n):
    """Probability of decoding error for [n,1,n] repetition code.
    Error occurs when more than n//2 bits are flipped."""
    t = n // 2  # correction capability
    p_error = sum(comb(n, i) * p**i * (1-p)**(n-i)
                  for i in range(t + 1, n + 1))
    return p_error

# Compare for p = 0.01
for n in [1, 3, 5, 7, 9]:
    pe = repetition_error_prob(0.01, n)
    rate = 1/n
    print(f"[{n},1,{n}]: P_error = {pe:.2e}, Rate = {rate:.3f}")
CodeRatePerror (p=0.01)Improvement
Uncoded1.0001.00 × 10-2
[3,1,3]0.3332.98 × 10-433×
[5,1,5]0.2009.80 × 10-710,200×
[7,1,7]0.1433.40 × 10-1029M×
Error probability drops exponentially with n, but so does the code rate. This is the fundamental tradeoff.

Two-Dimensional Parity

Mathematics

Arrange data in a rectangular grid and compute parity for each row AND each column. This creates a 2D parity check code.

Layout

  Data bits   | Row
  -----------+-----
  1 0 1 1    |  1
  0 1 1 0    |  0
  1 1 0 1    |  1
  -----------+-----
  0 0 0 0    | Col parity
                

Properties

  • Detects all 1, 2, and 3-bit errors
  • Corrects all single-bit errors
  • Can locate the error at the intersection of the failed row and column checks
  • Some 4-bit error patterns go undetected
Single error correction: If row i and column j both fail parity, the error is at position (i, j). Flip that bit to correct it!

2D Parity: Worked Example

Mathematics Interactive

Original (with parity)

  1 0 1 1 | 1
  0 1 1 0 | 0
  1 1 0 1 | 1
  ---------+--
  0 0 0 0 | 0
                

All row/column parities correct

Error at position (1,2)

  1 0 1 1 | 1 ✓
  0 0 1 0 | 0 
  1 1 0 1 | 1 ✓
  ---------+--
  0 0 0 0 | 0
     
                

Row 1 fails, Col 1 fails → error at (1,1)

Rate = (r · c) / ((r+1) · (c+1))    For 4×3: R = 12/20 = 0.60
Limitation: A rectangular pattern of 4 errors (forming a rectangle in the grid) is undetectable — each row and column parity still checks out.

Python: 2D Parity

Code
def encode_2d_parity(data, rows, cols):
    """Encode data with 2D parity (row + column + corner)."""
    grid = [data[i*cols:(i+1)*cols] for i in range(rows)]
    # Add row parity
    for row in grid:
        row.append(sum(row) % 2)
    # Add column parity row
    col_parity = [sum(grid[r][c] for r in range(rows)) % 2
                  for c in range(cols + 1)]
    grid.append(col_parity)
    return grid

def check_2d_parity(grid):
    """Check 2D parity; return error position or None."""
    rows, cols = len(grid), len(grid[0])
    bad_rows = [r for r in range(rows) if sum(grid[r]) % 2 != 0]
    bad_cols = [c for c in range(cols)
                if sum(grid[r][c] for r in range(rows)) % 2 != 0]
    if not bad_rows and not bad_cols:
        return None  # No error detected
    if len(bad_rows) == 1 and len(bad_cols) == 1:
        return (bad_rows[0], bad_cols[0])  # Single error located!
    return "Multiple errors detected"

# Example
data = [1,0,1,1, 0,1,1,0, 1,1,0,1]
grid = encode_2d_parity(data, 3, 4)
grid[1][1] ^= 1  # Flip bit at (1,1)
print(check_2d_parity(grid))  # (1, 1)

ISBN Check Digits

Applications History

The International Standard Book Number (ISBN) uses a check digit system to catch transcription errors when book numbers are typed or spoken.

ISBN-10 (pre-2007)

10 digits: d1d2...d10

i=110 i · di ≡ 0 (mod 11)

Last digit can be 'X' (representing 10)

Detects: all single-digit errors AND all transpositions of adjacent digits!

ISBN-13 (current)

13 digits: d1d2...d13

i=113 wi · di ≡ 0 (mod 10)

Weights alternate: 1, 3, 1, 3, ...

Detects: all single-digit errors but NOT all transpositions.

Why modular arithmetic? Using mod 11 (a prime) for ISBN-10 ensures that all single-digit substitutions and adjacent transpositions are caught — a property mod 10 cannot guarantee.

Python: ISBN Validation

Code
def validate_isbn10(isbn: str) -> bool:
    """Validate an ISBN-10 check digit."""
    isbn = isbn.replace('-', '')
    if len(isbn) != 10:
        return False
    total = 0
    for i, ch in enumerate(isbn):
        if ch == 'X' and i == 9:
            val = 10
        elif ch.isdigit():
            val = int(ch)
        else:
            return False
        total += (i + 1) * val
    return total % 11 == 0

def validate_isbn13(isbn: str) -> bool:
    """Validate an ISBN-13 check digit."""
    isbn = isbn.replace('-', '')
    if len(isbn) != 13 or not isbn.isdigit():
        return False
    weights = [1, 3] * 6 + [1]
    total = sum(int(d) * w for d, w in zip(isbn, weights))
    return total % 10 == 0

# Examples
print(validate_isbn10("0-306-40615-2"))  # True
print(validate_isbn13("978-0-306-40615-7"))  # True
print(validate_isbn10("0-306-40615-3"))  # False (wrong check digit)

ISBN-10: Transposition Detection

Mathematics

Why does ISBN-10 detect all adjacent transpositions? Consider swapping digits at positions j and j+1:

Original: ∑ = ... + j·a + (j+1)·b + ...
Swapped:  ∑' = ... + j·b + (j+1)·a + ...
∑' − ∑ = j(b−a) + (j+1)(a−b) = (a−b)(j+1−j) = (a−b)

For this to go undetected: (a − b) ≡ 0 (mod 11)

Since 0 ≤ a, b ≤ 10 and a ≠ b: |a − b| ranges from 1 to 10, none of which are divisible by 11.

The magic of mod 11: Because 11 is prime and larger than all digit values, no non-trivial digit difference is zero mod 11. This algebraic property is why ISBN-10 is mathematically superior to ISBN-13 for transposition detection.

The Luhn Algorithm

Applications History

Invented by Hans Peter Luhn at IBM in 1954, the Luhn algorithm is used to validate credit card numbers, IMEI numbers, and various ID systems.

Algorithm Steps

  1. Starting from the rightmost digit, double every second digit
  2. If doubling produces a number > 9, subtract 9
  3. Sum all digits
  4. If total mod 10 = 0, the number is valid

Example: 4539 1488 0343 6467

Original:   4 5 3 9 1 4 8 8 0 3 4 3 6 4 6 7
Doubled:   8 5 6 9 2 4 7 8 0 3 8 3 3 4 3 7
Sum:       8+5+6+9+2+4+7+8+0+3+8+3+3+4+3+7 = 80
80 mod 10 = 0 ✓ Valid!

Python: Luhn Algorithm

Code
def luhn_validate(number: str) -> bool:
    """Validate a number using the Luhn algorithm."""
    digits = [int(d) for d in number if d.isdigit()]
    # Process from right to left, doubling every second digit
    checksum = 0
    for i, d in enumerate(reversed(digits)):
        if i % 2 == 1:  # Every second digit from right
            d *= 2
            if d > 9:
                d -= 9
        checksum += d
    return checksum % 10 == 0

def luhn_generate_check_digit(partial: str) -> int:
    """Generate the Luhn check digit for a partial number."""
    for check in range(10):
        if luhn_validate(partial + str(check)):
            return check
    return -1  # Should never reach here

# Examples
print(luhn_validate("4539148803436467"))  # True
print(luhn_validate("4539148803436468"))  # False
print(luhn_generate_check_digit("453914880343646"))  # 7
Luhn detects: all single-digit errors and most (but not all) transpositions of adjacent digits. It fails for transpositions of 0↔9.

Luhn: Strengths & Limitations

Mathematics

Detects

  • All single-digit substitution errors
  • Most adjacent transpositions
  • Simple to implement (no division)
  • Works with standard decimal digits

Misses

  • Transposition of 0 ↔ 9 (09 ↔ 90)
  • Some twin errors (e.g., 22 ↔ 55)
  • Any error pattern that preserves the checksum
  • NOT a security measure — easily forged
Important: The Luhn algorithm is a sanity check, not a security feature. It catches accidental typos but provides zero protection against intentional fraud. Never confuse error detection with authentication!

Better alternative: The Verhoeff algorithm (1969) uses the dihedral group D5 to catch ALL single errors and ALL adjacent transpositions, but is more complex to implement.

Rate vs. Correction Tradeoff

Mathematics

Every error-control code faces a fundamental tradeoff: more redundancy means better error protection but lower throughput.

Code[n, k, d]RateDetectsCorrects
Uncoded[n, n, 1]1.00000
Single parity[8, 7, 2]0.87510
2D parity (4×3)[20, 12, 4]0.60031
Repetition-3[3, 1, 3]0.33321
Repetition-5[5, 1, 5]0.20042
Hamming(7,4)[7, 4, 3]0.57121
Notice: Hamming(7,4) corrects 1 error just like repetition-3, but at rate 0.571 vs 0.333. Better codes achieve more correction with less redundancy. This is what coding theory optimization is all about.

Limitations of Simple Codes

Mathematics

Single Parity Check

  • Cannot correct errors, only detect
  • Misses all even-weight error patterns
  • No information about error location
  • Requires ARQ (retransmission) protocol

Repetition Codes

  • Catastrophically low code rate
  • Only encodes 1 bit at a time
  • Bandwidth cost grows linearly with protection
  • Far from any theoretical bound

2D Parity

  • Misses rectangular error patterns
  • Limited to single-error correction
  • Overhead grows with grid size
  • Not algebraically elegant

Check Digit Schemes

  • Application-specific, not general-purpose
  • Detect only, cannot correct
  • Limited error pattern coverage
  • Not suitable for binary channels

Summary & What's Next

Key Takeaways

  • Parity bits are the simplest error detection — XOR of all data bits
  • Repetition codes can correct errors but at terrible rates
  • 2D parity improves on both by combining row and column checks
  • Real-world check digits (ISBN, Luhn) use modular arithmetic for human-facing validation
  • All simple codes fall far short of Shannon's channel capacity

Coming Up: Hamming Codes

Richard Hamming asked: can we do much better? His answer was a resounding yes.

Hamming(7,4): Corrects 1 error with rate 4/7 ≈ 0.57, compared to repetition-3's rate of 0.33 for the same correction capability.

We'll see how clever placement of parity bits enables not just detection but location of errors.

Simple Codes
Hamming Codes
CRC
Advanced Codes