Coding Theory Series — 06

Convolutional Codes & Viterbi Decoding

Memory-based encoding, trellis representations, and the algorithm that reached the edge of the solar system

Mathematics Code History Applications

Roadmap

Origins

Elias (1955), Viterbi (1967), history and motivation

Encoder

Shift registers, generator polynomials, state machines

Representations

State diagrams, trellis diagrams, code trees

Viterbi Algorithm

Path metrics, branch metrics, survivor paths, trace-back

Advanced Topics

Soft decisions, puncturing, sequential decoding

Applications

Deep space, GSM, WiFi, modems — everywhere!

Historical Origins

History

Peter Elias (1955)

At MIT, Elias introduced convolutional codes as an alternative to block codes. Unlike block codes that operate on fixed-size message blocks, convolutional codes process data as a continuous stream.

Elias was motivated by the impracticality of achieving Shannon's capacity with block codes of the lengths available at the time.

Andrew Viterbi (1967)

At UCLA/Linkabit, Viterbi invented the decoding algorithm that bears his name. Originally described for decoding convolutional codes, the Viterbi algorithm is now recognized as a general solution for finding the most likely sequence of hidden states.

Viterbi later co-founded Qualcomm. The algorithm is used in speech recognition, DNA sequence analysis, and more.

Impact: Convolutional codes + Viterbi decoding became the standard for reliable communication in space, cellular, and data networks for over 40 years.

Block Codes vs. Convolutional Codes

Mathematics
Property Block Codes Convolutional Codes
Input Fixed k-bit blocks Continuous bit stream
Memory None (memoryless) Depends on previous m inputs
Encoding Matrix multiplication Shift register + XOR
Output n-bit codewords n bits per input bit (continuous)
Decoding Syndrome, algebraic Viterbi (trellis search)
Latency One block delay 5× constraint length typical
Key insight: Convolutional codes spread redundancy across the bit stream rather than within fixed blocks, enabling better performance at the same rate.

Encoder Structure

Mathematics

Shift Register Encoder

Input
ut
Register
s1
Register
s2
XOR
Outputs

Parameters

  • Rate R = k/n: k input bits produce n output bits
  • Constraint length K: number of shift register stages + 1 (total memory)
  • Memory m = K - 1: number of delay elements
  • States: 2m possible encoder states

Generator Polynomials

Each output is a mod-2 sum of selected register taps:

g(1) = (1, 1, 1) → output 1 = u ⊕ s₁ ⊕ s₂
g(2) = (1, 0, 1) → output 2 = u ⊕ s₂

Octal notation: g₁ = 7, g₂ = 5 (common rate-1/2 code)

Generator Polynomials

Mathematics

Polynomial Representation

The encoder output can be described as convolution (hence the name!):

v(j)(D) = u(D) · g(j)(D)    (mod 2)

where D is the delay operator (D-transform).

Example: Rate 1/2, K=3 Encoder (NASA Standard)

g₁(D) = 1 + D + D²

Binary: 111, Octal: 7

Taps: input, s₁, s₂

g₂(D) = 1 + D²

Binary: 101, Octal: 5

Taps: input, s₂ only

Convention: Generator polynomials are often written in octal: (7, 5) for K=3 rate-1/2. NASA's Voyager used (171, 133) — a K=7 rate-1/2 code.

State Diagram

Mathematics

Rate 1/2, K=3 Encoder — States: {00, 01, 10, 11}

State Transitions

CurrentInput 0OutputInput 1Output
0000001011
0100111000
1001101101
1101011110

Reading the Table

  • State = contents of shift registers (s₁s₂)
  • Input 0: new bit is 0, shift right
  • Input 1: new bit is 1, shift right
  • Output: computed from taps via XOR

The state diagram is a finite state machine with 2m = 4 states.

State diagram vs code tree: State diagram shows all possible transitions compactly. The code tree expands every path — exponential growth but shows complete history.

Trellis Diagram

Mathematics

The Key to Viterbi Decoding

The trellis unfolds the state diagram in time. Each column represents one time step; each node represents a state.

Time:   t=0       t=1       t=2       t=3

State 00:   00 ——00—→ 00 ——00—→ 00 ——00—→ 00

State 01:                   01             01             01

State 10:                   10             10             10

State 11:                   11             11             11

Solid lines = input 0, dashed = input 1. Labels = output bits.

Crucial property: Many paths merge at each state. At most 2m surviving paths need to be tracked — this is what makes Viterbi tractable!

Code Rate & Constraint Length

Mathematics

Code Rate R = k/n

Common rates:

RateMeaningUse Case
1/21 in → 2 outDeep space, WiFi
1/31 in → 3 outVery noisy channels
2/32 in → 3 outBandwidth-limited
3/43 in → 4 outPunctured codes

Constraint Length K

K determines the encoder's memory and code strength:

KStatesViterbi Complexity
34Trivial
516Easy
764Standard (Voyager)
9256Moderate
148192Near practical limit
Complexity trade-off: Viterbi decoder complexity grows as O(2K) — exponential in constraint length! This is why K rarely exceeds ~10 in practice.

Free Distance

Mathematics

The Key Performance Metric

The free distance dfree is the minimum Hamming distance between any two codeword sequences that start and end in the same state:

dfree = min { d(v, v') : v ≠ v', both start and end at state 0 }

Equivalently: the minimum weight path that diverges from and re-merges with the all-zeros path.

Error Correction Capability

t = ⌊(dfree - 1) / 2⌋

For the (7,5) code: dfree = 5, so t = 2.

Common Codes

Codedfree
R=1/2, K=3 (7,5)5
R=1/2, K=5 (23,35)7
R=1/2, K=7 (171,133)10
R=1/3, K=3 (7,7,5)8

The Viterbi Algorithm

Mathematics

Maximum Likelihood Sequence Estimation

Given received sequence r, find the most likely transmitted sequence v:

v̂ = argmaxv P(r | v) = argminv d(r, v)
Initialize
metrics = 0
Compute
branch metrics
Add-Compare
Select (ACS)
Store
survivors
Trace
Back
Key insight: At each state, only the BEST path (survivor) needs to be kept. All other paths merging at that state are provably suboptimal and can be discarded.

Branch & Path Metrics

Mathematics

Branch Metric

Distance between received bits and expected output on each trellis branch:

BM = dH(received, expected)

Hard decision: Hamming distance (count differing bits)

Soft decision: Euclidean distance or correlation metric

Path Metric

Cumulative metric along a path through the trellis:

PM(state, t) = PM(prev, t-1) + BM

The path with the smallest cumulative metric is the most likely (for hard decisions).

Add-Compare-Select (ACS)

For each state at time t, exactly 2 paths arrive. Add branch metric to each predecessor's path metric, compare the two candidates, select the survivor (lower metric wins).

Viterbi: Worked Example (1/3)

Mathematics Code

Setup: Rate 1/2, K=3, generators (7,5)

Transmitted message: u = [1, 1, 0, 1, 0]

Encoded output: v = [11, 10, 01, 00, 01, 11, 00, 00]

Received with errors: r = [11, 00, 01, 00, 01, 11, 00, 00]

(Error in second pair: received 00 instead of 10)

Step 1: Initialize

StateMetric at t=0
00 (start)0
01, 10, 11∞ (impossible)

Step 2: t=0, received r₁ = 11

From state 00: input 0 → output 00, BM = d(11,00) = 2 → state 00, PM = 2

From state 00: input 1 → output 11, BM = d(11,11) = 0 → state 10, PM = 0

Viterbi: Worked Example (2/3)

Mathematics

Step 3: t=1, received r₂ = 00

To StateFromPath MetricSurvivor?
0000 (input 0, out 00)2 + d(00,00) = 2Yes
0001 (input 0, out 11)No
0110 (input 0, out 10)0 + d(00,10) = 1Yes
1000 (input 1, out 11)2 + d(00,11) = 4No
1001 (input 1, out 00)
100 + d(00,11) = 2Yes (tie)
1110 (input 1, out 01)0 + d(00,01) = 1Yes
Notice: Even though there was an error in the received data, the correct path (through state 10) has a competitive metric. The Viterbi algorithm keeps it alive!

Viterbi: Worked Example (3/3)

Mathematics

Continuing the ACS process...

After processing all received pairs, each state has a single surviving path and a cumulative path metric.

Trace-Back

Starting from state 00 at the final time step (after tail bits flush the encoder), trace the survivor path backward:

00
t=0
1
10
t=1
1
11
t=2
0
01
t=3
1
10
t=4
0
00
final
Decoded message: [1, 1, 0, 1, 0] — matches the original! The single error was successfully corrected by the Viterbi algorithm.

Hard vs. Soft Decision Decoding

Mathematics

Hard Decision

Demodulator outputs 0 or 1 (binary).

Branch metric: Hamming distance

BM = Σ |ri - vi|

Simpler hardware, but throws away reliability information.

Soft Decision

Demodulator outputs analog values (e.g., 3-bit quantization: 0-7).

Branch metric: Euclidean or correlation

BM = Σ (ri - vi

More complex but preserves channel reliability.

Soft decision gain: approximately 2 dB improvement over hard decision — equivalent to doubling the transmitter power or halving the distance! This is practically "free" performance.

Most modern Viterbi decoders use 3-bit (8-level) soft quantization. Diminishing returns beyond 4 bits.

Punctured Convolutional Codes

Mathematics Applications

Achieving Higher Rates from a Base Code

Start with a low-rate code (e.g., R=1/2) and periodically delete (puncture) some output bits to increase the effective rate.

Puncture Pattern Example

Base: R=1/2, K=7

P = [ 1 1 ] → R = 2/3
    [ 1 0 ]

1 = transmit, 0 = puncture

Every 2 input bits: transmit 3 of 4 output bits.

Advantages

  • Flexible rate: one encoder/decoder supports many rates
  • Rate-compatible: can adapt to channel conditions
  • Used in: WiFi (802.11), LTE, DVB

Decoder inserts "erasures" at punctured positions and runs standard Viterbi.

Applications: Deep Space

Applications History

Convolutional Codes in Space Exploration

Voyager 1 & 2 (1977)

R=1/2, K=7 convolutional code (171, 133)

Later concatenated with RS outer code

Still transmitting from interstellar space at 160 bps!

Mars Rovers (2004+)

Concatenated turbo code + convolutional inner code

Eb/N0 as low as 0.8 dB from Shannon limit

New Horizons (2015)

K=15 convolutional code for Pluto flyby

Signal power: 10-16 watts at Earth!

James Webb (2022)

Modern LDPC codes, but heritage from convolutional era

Applications: Terrestrial Communications

Applications

GSM Cellular (2G)

R=1/2, K=5 convolutional code

Different rates for voice (R=1/2) and data (R=1/3)

Deployed 1991, billions of phones

WiFi (802.11a/g/n)

R=1/2, K=7 base code (171, 133)

Punctured to R=2/3, 3/4, 5/6

Viterbi decoder in every WiFi chip

Dial-Up Modems (V.32/V.34)

Trellis-coded modulation (TCM)

Ungerboeck's coding + modulation

Enabled 28.8/33.6 kbps over phone lines

Digital TV (DVB-T)

Convolutional inner code + RS outer code

Multiple rate options for different conditions

Replaced by LDPC in DVB-T2

Sequential Decoding

Mathematics

An Alternative to Viterbi

For large constraint lengths (K > 15), Viterbi is impractical. Sequential decoding explores the trellis selectively rather than exhaustively.

Fano Algorithm (1963)

  • Explores code tree depth-first
  • Uses a running threshold
  • Moves forward if metric improves
  • Backs up and tries alternatives otherwise
  • Adjusts threshold dynamically

Variable computation time — fast when channel is good.

Stack Algorithm (Zigangirov-Jelinek)

  • Maintains a sorted stack of partial paths
  • Always extends the best path
  • Simpler logic than Fano
  • Requires more memory (stack grows)

Guaranteed to find ML path eventually.

Average vs. worst case: Sequential decoders have variable computation — occasional "computational blowup" when noise is severe. Not suitable for hard real-time systems.

Python: Convolutional Encoder

Code
class ConvolutionalEncoder:
    """Rate 1/n convolutional encoder with K constraint length."""
    def __init__(self, generators, K):
        """
        generators: list of generator polynomials (octal)
        K: constraint length
        """
        self.generators = [int(oct_str) for oct_str in generators]
        self.K = K
        self.n = len(generators)
        self.state = 0  # shift register state

    def _octal_to_taps(self, octal_val):
        """Convert octal generator to binary tap positions."""
        binary = bin(int(str(octal_val), 8))[2:]
        return [int(b) for b in binary.zfill(self.K)]

    def encode_bit(self, input_bit):
        """Encode single input bit, return n output bits."""
        # Shift input into register
        combined = (input_bit << (self.K - 1)) | self.state
        outputs = []
        for gen in self.generators:
            taps = self._octal_to_taps(gen)
            out = 0
            for i, tap in enumerate(taps):
                if tap:
                    out ^= (combined >> (self.K - 1 - i)) & 1
            outputs.append(out)
        # Update state
        self.state = (combined >> 1) & ((1 << (self.K-1)) - 1)
        return outputs

    def encode(self, message):
        """Encode full message with tail bits."""
        self.state = 0
        output = []
        # Encode message bits
        for bit in message:
            output.extend(self.encode_bit(bit))
        # Add tail bits to flush encoder to state 0
        for _ in range(self.K - 1):
            output.extend(self.encode_bit(0))
        return output

Python: Encoder Demo & Test

Code
# Create rate 1/2, K=3 encoder with generators (7, 5)
encoder = ConvolutionalEncoder(['7', '5'], K=3)

# Encode a message
message = [1, 1, 0, 1, 0]
encoded = encoder.encode(message)
print(f"Message:  {message}")
print(f"Encoded:  {encoded}")
# Output: [1,1, 1,0, 0,1, 0,0, 0,1, 1,1, 0,0]

# Show output as pairs
pairs = [(encoded[i], encoded[i+1]) for i in range(0, len(encoded), 2)]
print(f"As pairs: {pairs}")
# [(1,1), (1,0), (0,1), (0,0), (0,1), (1,1), (0,0)]

# Verify by tracing state machine manually
encoder2 = ConvolutionalEncoder(['7', '5'], K=3)
print("\nStep-by-step trace:")
print(f"{'Time':>4} {'Input':>5} {'State':>7} {'Output':>7}")
for i, bit in enumerate(message + [0, 0]):  # +tail
    state_before = format(encoder2.state, '02b')
    out = encoder2.encode_bit(bit)
    print(f"{i:>4} {bit:>5} {state_before:>7} {''.join(map(str,out)):>7}")

# Verify: rate = len(message) / len(encoded)
rate = len(message) / (len(message) + encoder.K - 1) / encoder.n
print(f"\nEffective rate: {len(message)}/{len(encoded)} "
      f"= {len(message)/len(encoded):.3f}")

Python: Viterbi Decoder

Code
def viterbi_decode(received, generators, K):
    """Hard-decision Viterbi decoder for rate 1/n convolutional code."""
    n_outputs = len(generators)
    n_states = 1 << (K - 1)
    n_steps = len(received) // n_outputs

    # Precompute branch outputs for each state and input
    enc = ConvolutionalEncoder([str(g) for g in generators], K)
    branch_output = {}
    for state in range(n_states):
        for inp in [0, 1]:
            enc.state = state
            out = enc.encode_bit(inp)
            next_state = enc.state
            branch_output[(state, inp)] = (next_state, out)

    # Initialize path metrics
    INF = float('inf')
    path_metric = [INF] * n_states
    path_metric[0] = 0
    survivor = [[[] for _ in range(n_states)] for _ in range(n_steps)]

    # Forward pass: ACS
    for t in range(n_steps):
        rx = received[t * n_outputs : (t + 1) * n_outputs]
        new_metric = [INF] * n_states
        for state in range(n_states):
            if path_metric[state] == INF:
                continue
            for inp in [0, 1]:
                ns, out = branch_output[(state, inp)]
                bm = sum(r != o for r, o in zip(rx, out))
                candidate = path_metric[state] + bm
                if candidate < new_metric[ns]:
                    new_metric[ns] = candidate
                    survivor[t][ns] = survivor[max(0,t-1)][state] + [inp] \
                                       if t > 0 else [inp]
        path_metric = new_metric

    # Trace back from state 0
    return survivor[n_steps - 1][0][: -(K-1)]  # remove tail bits

Performance Characteristics

Mathematics Applications

BER vs Eb/N0 Comparison

Scheme BER = 10-5 Coding Gain
Uncoded BPSK 9.6 dB
R=1/2 K=3 Hard Viterbi 6.5 dB 3.1 dB
R=1/2 K=7 Hard Viterbi 5.2 dB 4.4 dB
R=1/2 K=7 Soft Viterbi 3.3 dB 6.3 dB
R=1/2 K=7 Soft + RS outer 2.5 dB 7.1 dB
Shannon limit R=1/2 0.2 dB 9.4 dB
Concatenated coding (Viterbi inner + RS outer) was the workhorse of deep space communication from 1977 to the early 2000s, operating within ~2 dB of Shannon's limit.

Summary

Convolutional Codes

  • Memory-based encoding via shift registers
  • Characterized by rate R, constraint length K
  • dfree determines error correction power
  • Trellis representation enables efficient decoding

Viterbi Algorithm

  • Optimal (ML) sequence decoder
  • Complexity: O(2K) per time step
  • Soft decision: ~2 dB gain over hard
  • One of the most important algorithms in history

Legacy & Impact

Voyager
1977
GSM
1991
WiFi
1999
LTE
2009
Turbo/LDPC
successors
Next up: BCH codes — powerful algebraic block codes built on finite field theory.