Coding Theory Series — 06

Convolutional Codes & Viterbi Decoding

Memory-based encoding, trellis representations, and the algorithm that reached the edge of the solar system

Mathematics Code History Applications

Use arrow keys or swipe to navigate • ESC for overview

Roadmap

Origins

Elias (1955), Viterbi (1967), history and motivation

Encoder

Shift registers, generator polynomials, state machines

Representations

State diagrams, trellis diagrams, code trees

Viterbi Algorithm

Path metrics, branch metrics, survivor paths, trace-back

Advanced Topics

Soft decisions, puncturing, sequential decoding

Applications

Deep space, GSM, WiFi, modems — everywhere!

Historical Origins

History

Peter Elias (1955)

At MIT, Elias introduced convolutional codes as an alternative to block codes. Unlike block codes that operate on fixed-size message blocks, convolutional codes process data as a continuous stream.

Elias was motivated by the impracticality of achieving Shannon's capacity with block codes of the lengths available at the time.

Andrew Viterbi (1967)

At UCLA/Linkabit, Viterbi invented the decoding algorithm that bears his name. Originally described for decoding convolutional codes, the Viterbi algorithm is now recognized as a general solution for finding the most likely sequence of hidden states.

Viterbi later co-founded Qualcomm. The algorithm is used in speech recognition, DNA sequence analysis, and more.

Impact: Convolutional codes + Viterbi decoding became the standard for reliable communication in space, cellular, and data networks for over 40 years.

Block Codes vs. Convolutional Codes

Mathematics

Property	Block Codes	Convolutional Codes
Input	Fixed k-bit blocks	Continuous bit stream
Memory	None (memoryless)	Depends on previous m inputs
Encoding	Matrix multiplication	Shift register + XOR
Output	n-bit codewords	n bits per input bit (continuous)
Decoding	Syndrome, algebraic	Viterbi (trellis search)
Latency	One block delay	5× constraint length typical

Key insight: Convolutional codes spread redundancy across the bit stream rather than within fixed blocks, enabling better performance at the same rate.

Encoder Structure

Mathematics

Shift Register Encoder

Input
u_t

→

XOR
Outputs

Parameters

Rate R = k/n: k input bits produce n output bits
Constraint length K: number of shift register stages + 1 (total memory)
Memory m = K - 1: number of delay elements
States: 2^m possible encoder states

Generator Polynomials

Each output is a mod-2 sum of selected register taps:

g (1) = (1, 1, 1) \to output 1 = u \oplus s₁ \oplus s₂ g (2) = (1, 0, 1) \to output 2 = u \oplus s₂

Octal notation: g₁ = 7, g₂ = 5 (common rate-1/2 code)

Generator Polynomials

Mathematics

Polynomial Representation

The encoder output can be described as convolution (hence the name!):

v^(j)(D) = u(D) · g^(j)(D) (mod 2)

where D is the delay operator (D-transform).

Example: Rate 1/2, K=3 Encoder (NASA Standard)

g₁(D) = 1 + D + D²

Binary: 111, Octal: 7

Taps: input, s₁, s₂

g₂(D) = 1 + D²

Binary: 101, Octal: 5

Taps: input, s₂ only

Convention: Generator polynomials are often written in octal: (7, 5) for K=3 rate-1/2. NASA's Voyager used (171, 133) — a K=7 rate-1/2 code.

State Diagram

Mathematics

Rate 1/2, K=3 Encoder — States: {00, 01, 10, 11}

State Transitions

Current	Input 0	Output	Input 1	Output
00	00	00	10	11
01	00	11	10	00
10	01	10	11	01
11	01	01	11	10

Reading the Table

State = contents of shift registers (s₁s₂)
Input 0: new bit is 0, shift right
Input 1: new bit is 1, shift right
Output: computed from taps via XOR

The state diagram is a finite state machine with 2^m = 4 states.

State diagram vs code tree: State diagram shows all possible transitions compactly. The code tree expands every path — exponential growth but shows complete history.

Trellis Diagram

Mathematics

The Key to Viterbi Decoding

The trellis unfolds the state diagram in time. Each column represents one time step; each node represents a state.

Time: t=0 t=1 t=2 t=3

State 00:   00 ——00—→ 00 ——00—→ 00 ——00—→ 00
State 01:                   01             01             01
State 10:                   10             10             10
State 11:                   11             11             11

Solid lines = input 0, dashed = input 1. Labels = output bits.

Crucial property: Many paths merge at each state. At most 2^m surviving paths need to be tracked — this is what makes Viterbi tractable!

Code Rate & Constraint Length

Mathematics

Code Rate R = k/n

Common rates:

Rate	Meaning	Use Case
1/2	1 in → 2 out	Deep space, WiFi
1/3	1 in → 3 out	Very noisy channels
2/3	2 in → 3 out	Bandwidth-limited
3/4	3 in → 4 out	Punctured codes

Constraint Length K

K determines the encoder's memory and code strength:

K	States	Viterbi Complexity
3	4	Trivial
5	16	Easy
7	64	Standard (Voyager)
9	256	Moderate
14	8192	Near practical limit

Complexity trade-off: Viterbi decoder complexity grows as O(2^K) — exponential in constraint length! This is why K rarely exceeds ~10 in practice.

Free Distance

Mathematics

The Key Performance Metric

The free distance d_free is the minimum Hamming distance between any two codeword sequences that start and end in the same state:

d_free = min { d(v, v') : v ≠ v', both start and end at state 0 }

Equivalently: the minimum weight path that diverges from and re-merges with the all-zeros path.

Error Correction Capability

t = ⌊(d_free - 1) / 2⌋

For the (7,5) code: d_free = 5, so t = 2.

Common Codes

Code	d_free
R=1/2, K=3 (7,5)	5
R=1/2, K=5 (23,35)	7
R=1/2, K=7 (171,133)	10
R=1/3, K=3 (7,7,5)	8

The Viterbi Algorithm

Mathematics

Maximum Likelihood Sequence Estimation

Given received sequence r, find the most likely transmitted sequence v:

v̂ = argmax_v P(r | v) = argmin_v d(r, v)

Initialize
metrics = 0

→

Compute
branch metrics

→

Add-Compare
Select (ACS)

→

Store
survivors

→

Trace
Back

Key insight: At each state, only the BEST path (survivor) needs to be kept. All other paths merging at that state are provably suboptimal and can be discarded.

Branch & Path Metrics

Mathematics

Branch Metric

Distance between received bits and expected output on each trellis branch:

BM = d_H(received, expected)

Hard decision: Hamming distance (count differing bits)

Soft decision: Euclidean distance or correlation metric

Path Metric

Cumulative metric along a path through the trellis:

PM(state, t) = PM(prev, t-1) + BM

The path with the smallest cumulative metric is the most likely (for hard decisions).

Add-Compare-Select (ACS)

For each state at time t, exactly 2 paths arrive. Add branch metric to each predecessor's path metric, compare the two candidates, select the survivor (lower metric wins).

Viterbi: Worked Example (1/3)

Mathematics Code

Setup: Rate 1/2, K=3, generators (7,5)

Transmitted message: u = [1, 1, 0, 1, 0]

Encoded output: v = [11, 10, 01, 00, 01, 11, 00, 00]

Received with errors: r = [11, 00, 01, 00, 01, 11, 00, 00]

(Error in second pair: received 00 instead of 10)

Step 1: Initialize

State	Metric at t=0
00 (start)	0
01, 10, 11	∞ (impossible)

Step 2: t=0, received r₁ = 11

From state 00: input 0 → output 00, BM = d(11,00) = 2 → state 00, PM = 2

From state 00: input 1 → output 11, BM = d(11,11) = 0 → state 10, PM = 0

Viterbi: Worked Example (2/3)

Mathematics

Step 3: t=1, received r₂ = 00

To State	From	Path Metric	Survivor?
00	00 (input 0, out 00)	2 + d(00,00) = 2	Yes
00	01 (input 0, out 11)	∞	No
01	10 (input 0, out 10)	0 + d(00,10) = 1	Yes
10	00 (input 1, out 11)	2 + d(00,11) = 4	No
10	01 (input 1, out 00)	∞	—
10	—	0 + d(00,11) = 2	Yes (tie)
11	10 (input 1, out 01)	0 + d(00,01) = 1	Yes

Notice: Even though there was an error in the received data, the correct path (through state 10) has a competitive metric. The Viterbi algorithm keeps it alive!

Viterbi: Worked Example (3/3)

Mathematics

Continuing the ACS process...

After processing all received pairs, each state has a single surviving path and a cumulative path metric.

Trace-Back

Starting from state 00 at the final time step (after tail bits flush the encoder), trace the survivor path backward:

00
t=0

→ 1

10
t=1

→ 1

11
t=2

→ 0

01
t=3

→ 1

10
t=4

→ 0

00
final

Decoded message: [1, 1, 0, 1, 0] — matches the original! The single error was successfully corrected by the Viterbi algorithm.

Hard vs. Soft Decision Decoding

Mathematics

Hard Decision

Demodulator outputs 0 or 1 (binary).

Branch metric: Hamming distance

BM = Σ |r i - v i |

Simpler hardware, but throws away reliability information.

Soft Decision

Demodulator outputs analog values (e.g., 3-bit quantization: 0-7).

Branch metric: Euclidean or correlation

BM = Σ (r i - v i)²

More complex but preserves channel reliability.

Soft decision gain: approximately 2 dB improvement over hard decision — equivalent to doubling the transmitter power or halving the distance! This is practically "free" performance.

Most modern Viterbi decoders use 3-bit (8-level) soft quantization. Diminishing returns beyond 4 bits.

Punctured Convolutional Codes

Mathematics Applications

Achieving Higher Rates from a Base Code

Start with a low-rate code (e.g., R=1/2) and periodically delete (puncture) some output bits to increase the effective rate.

Puncture Pattern Example

Base: R=1/2, K=7

P = [ 1 1 ] \to R = 2/3 [ 1 0 ] 1 = transmit, 0 = puncture

Every 2 input bits: transmit 3 of 4 output bits.

Advantages

Flexible rate: one encoder/decoder supports many rates
Rate-compatible: can adapt to channel conditions
Used in: WiFi (802.11), LTE, DVB

Decoder inserts "erasures" at punctured positions and runs standard Viterbi.

Applications: Deep Space

Applications History

Convolutional Codes in Space Exploration

Voyager 1 & 2 (1977)

R=1/2, K=7 convolutional code (171, 133)

Later concatenated with RS outer code

Still transmitting from interstellar space at 160 bps!

Mars Rovers (2004+)

Concatenated turbo code + convolutional inner code

E_b/N₀ as low as 0.8 dB from Shannon limit

New Horizons (2015)

K=15 convolutional code for Pluto flyby

Signal power: 10^-16 watts at Earth!

James Webb (2022)

Modern LDPC codes, but heritage from convolutional era

Applications: Terrestrial Communications

Applications

GSM Cellular (2G)

R=1/2, K=5 convolutional code

Different rates for voice (R=1/2) and data (R=1/3)

Deployed 1991, billions of phones

WiFi (802.11a/g/n)

R=1/2, K=7 base code (171, 133)

Punctured to R=2/3, 3/4, 5/6

Viterbi decoder in every WiFi chip

Dial-Up Modems (V.32/V.34)

Trellis-coded modulation (TCM)

Ungerboeck's coding + modulation

Enabled 28.8/33.6 kbps over phone lines

Digital TV (DVB-T)

Convolutional inner code + RS outer code

Multiple rate options for different conditions

Replaced by LDPC in DVB-T2

Sequential Decoding

Mathematics

An Alternative to Viterbi

For large constraint lengths (K > 15), Viterbi is impractical. Sequential decoding explores the trellis selectively rather than exhaustively.

Fano Algorithm (1963)

Explores code tree depth-first
Uses a running threshold
Moves forward if metric improves
Backs up and tries alternatives otherwise
Adjusts threshold dynamically

Variable computation time — fast when channel is good.

Stack Algorithm (Zigangirov-Jelinek)

Maintains a sorted stack of partial paths
Always extends the best path
Simpler logic than Fano
Requires more memory (stack grows)

Guaranteed to find ML path eventually.

Average vs. worst case: Sequential decoders have variable computation — occasional "computational blowup" when noise is severe. Not suitable for hard real-time systems.

Python: Convolutional Encoder

Code

class ConvolutionalEncoder:
    """Rate 1/n convolutional encoder with K constraint length."""
    def __init__(self, generators, K):
        """
        generators: list of generator polynomials (octal)
        K: constraint length
        """
        self.generators = [int(oct_str) for oct_str in generators]
        self.K = K
        self.n = len(generators)
        self.state = 0  # shift register state

    def _octal_to_taps(self, octal_val):
        """Convert octal generator to binary tap positions."""
        binary = bin(int(str(octal_val), 8))[2:]
        return [int(b) for b in binary.zfill(self.K)]

    def encode_bit(self, input_bit):
        """Encode single input bit, return n output bits."""
        # Shift input into register
        combined = (input_bit << (self.K - 1)) | self.state
        outputs = []
        for gen in self.generators:
            taps = self._octal_to_taps(gen)
            out = 0
            for i, tap in enumerate(taps):
                if tap:
                    out ^= (combined >> (self.K - 1 - i)) & 1
            outputs.append(out)
        # Update state
        self.state = (combined >> 1) & ((1 << (self.K-1)) - 1)
        return outputs

    def encode(self, message):
        """Encode full message with tail bits."""
        self.state = 0
        output = []
        # Encode message bits
        for bit in message:
            output.extend(self.encode_bit(bit))
        # Add tail bits to flush encoder to state 0
        for _ in range(self.K - 1):
            output.extend(self.encode_bit(0))
        return output

Python: Encoder Demo & Test

Code

# Create rate 1/2, K=3 encoder with generators (7, 5)
encoder = ConvolutionalEncoder(['7', '5'], K=3)

# Encode a message
message = [1, 1, 0, 1, 0]
encoded = encoder.encode(message)
print(f"Message:  {message}")
print(f"Encoded:  {encoded}")
# Output: [1,1, 1,0, 0,1, 0,0, 0,1, 1,1, 0,0]

# Show output as pairs
pairs = [(encoded[i], encoded[i+1]) for i in range(0, len(encoded), 2)]
print(f"As pairs: {pairs}")
# [(1,1), (1,0), (0,1), (0,0), (0,1), (1,1), (0,0)]

# Verify by tracing state machine manually
encoder2 = ConvolutionalEncoder(['7', '5'], K=3)
print("\nStep-by-step trace:")
print(f"{'Time':>4} {'Input':>5} {'State':>7} {'Output':>7}")
for i, bit in enumerate(message + [0, 0]):  # +tail
    state_before = format(encoder2.state, '02b')
    out = encoder2.encode_bit(bit)
    print(f"{i:>4} {bit:>5} {state_before:>7} {''.join(map(str,out)):>7}")

# Verify: rate = len(message) / len(encoded)
rate = len(message) / (len(message) + encoder.K - 1) / encoder.n
print(f"\nEffective rate: {len(message)}/{len(encoded)} "
      f"= {len(message)/len(encoded):.3f}")

Python: Viterbi Decoder

Code

def viterbi_decode(received, generators, K):
    """Hard-decision Viterbi decoder for rate 1/n convolutional code."""
    n_outputs = len(generators)
    n_states = 1 << (K - 1)
    n_steps = len(received) // n_outputs

    # Precompute branch outputs for each state and input
    enc = ConvolutionalEncoder([str(g) for g in generators], K)
    branch_output = {}
    for state in range(n_states):
        for inp in [0, 1]:
            enc.state = state
            out = enc.encode_bit(inp)
            next_state = enc.state
            branch_output[(state, inp)] = (next_state, out)

    # Initialize path metrics
    INF = float('inf')
    path_metric = [INF] * n_states
    path_metric[0] = 0
    survivor = [[[] for _ in range(n_states)] for _ in range(n_steps)]

    # Forward pass: ACS
    for t in range(n_steps):
        rx = received[t * n_outputs : (t + 1) * n_outputs]
        new_metric = [INF] * n_states
        for state in range(n_states):
            if path_metric[state] == INF:
                continue
            for inp in [0, 1]:
                ns, out = branch_output[(state, inp)]
                bm = sum(r != o for r, o in zip(rx, out))
                candidate = path_metric[state] + bm
                if candidate < new_metric[ns]:
                    new_metric[ns] = candidate
                    survivor[t][ns] = survivor[max(0,t-1)][state] + [inp] \
                                       if t > 0 else [inp]
        path_metric = new_metric

    # Trace back from state 0
    return survivor[n_steps - 1][0][: -(K-1)]  # remove tail bits

Performance Characteristics

Mathematics Applications

BER vs E_b/N₀ Comparison

Scheme	BER = 10^-5	Coding Gain
Uncoded BPSK	9.6 dB	—
R=1/2 K=3 Hard Viterbi	6.5 dB	3.1 dB
R=1/2 K=7 Hard Viterbi	5.2 dB	4.4 dB
R=1/2 K=7 Soft Viterbi	3.3 dB	6.3 dB
R=1/2 K=7 Soft + RS outer	2.5 dB	7.1 dB
Shannon limit R=1/2	0.2 dB	9.4 dB

Concatenated coding (Viterbi inner + RS outer) was the workhorse of deep space communication from 1977 to the early 2000s, operating within ~2 dB of Shannon's limit.

Summary