Coding Theory Series — 12

Polar Codes

The first codes provably achieving channel capacity with efficient encoding and decoding — from Arikan's elegant theory to the 5G control channel.

Mathematics Code History Standards

Roadmap

Origins

Arikan's 2009 breakthrough and the concept of channel polarization.

Polarization

How channels split into perfect and useless — the core phenomenon.

Construction

Choosing information bits and the Kronecker product encoding.

Decoding

Successive cancellation, list decoding, and CRC-aided variants.

5G Standards

How polar codes won the battle for 5G control channels.

Comparisons

Polar vs LDPC vs turbo — strengths and trade-offs.

Arikan's 2009 Breakthrough

History

The Holy Grail

Since Shannon (1948), the field searched for codes that:

  • Provably achieve channel capacity
  • Have efficient encoding: O(n log n)
  • Have efficient decoding: O(n log n)
  • Work for any symmetric channel

For 60 years, no code satisfied all four conditions simultaneously.

Erdal Arikan

Professor at Bilkent University, Turkey. His 2009 paper "Channel Polarization: A Method for Constructing Capacity-Achieving Codes for Symmetric Binary-Input Discrete Memoryless Channels" solved the problem.

Polar codes are the first and only family of codes with a rigorous proof of achieving Shannon capacity with polynomial complexity.

Arikan received the IEEE Richard W. Hamming Medal (2018) and the Shannon Award (2023, anticipated) for this contribution.

Channel Polarization

Mathematics

Start with N copies of a channel W. Apply a specific linear transform. The resulting N "virtual channels" polarize: they become either nearly perfect (capacity → 1) or nearly useless (capacity → 0).

The Basic Transform (N=2)

Two uses of channel W with capacity I(W):

u1 → x1 = u1 ⊕ u2 → W → y1
u2 → x2 = u2        → W → y2

W-: Channel for u1 (seeing only y1, y2) — worse than W

W+: Channel for u2 (seeing y1, y2, u1) — better than W

Conservation Law

I(W-) + I(W+) = 2 I(W)

Total capacity is preserved! But:

I(W-) ≤ I(W) ≤ I(W+)

One channel gets better, the other gets worse. Capacity is redistributed, not created or destroyed.

The Polarization Phenomenon

Mathematics

Recursive Application

Apply the basic transform recursively n = log2(N) times. After n levels, N = 2n virtual channels emerge.

Polarization Theorem

As N → ∞, the fraction of channels with capacity near 1 approaches I(W), and the fraction with capacity near 0 approaches 1 - I(W).

For any δ > 0:
#{i : I(WN(i)) > 1-δ} / N → I(W)
#{i : I(WN(i)) < δ} / N → 1-I(W)

Example: BEC(ε=0.5)

NGood (>0.99)Bad (<0.01)Middle
41 (25%)1 (25%)2 (50%)
165 (31%)5 (31%)6 (38%)
256107 (42%)107 (42%)42 (16%)
40961946 (48%)1946 (48%)204 (5%)
50%50%0%

Polarization converges slowly: the "middle" channels vanish as O(2-Nβ) where β < 0.5. This means polar codes need long block lengths for best performance.

Bhattacharyya Parameter & Channel Reliability

Mathematics
Z(W) = Σy √(W(y|0) · W(y|1))

What It Measures

Z(W) ∈ [0, 1] quantifies channel reliability:

Z(W) = 0: Perfect channel (noiseless)

Z(W) = 1: Useless channel (pure noise)

Upper bounds error probability: Pe ≤ Z(W)

Recursive Computation

Z(W-) = 2Z(W) - Z(W)2
Z(W+) = Z(W)2

The "minus" channel degrades, the "plus" channel improves — quadratically!

Channel Ordering

For each of the N virtual channels, compute Z(WN(i)). Sort by reliability.

Best channels (lowest Z) carry information bits.

Worst channels (highest Z) carry frozen bits (set to 0).

The Bhattacharyya parameter gives us a simple, recursive way to identify which channels are reliable — this is how we construct polar codes.

Polar Code Construction

Mathematics Code

Choosing Information Bit Positions

For a polar code (N, K):

1. Compute Z(WN(i)) for all i = 0, ..., N-1

2. Choose K channels with smallest Bhattacharyya parameter

3. These carry information bits; the rest are "frozen" (set to known values, usually 0)

Construction Methods

MethodComplexityOptimality
BhattacharyyaO(N)Exact for BEC
Density evolutionO(N · M)Near-optimal AWGN
Gaussian approx.O(N)Good approx.
Monte CarloSlowChannel-specific
5G NR tablesO(1) lookupPre-computed

Frozen Bits

Frozen bits are known to both encoder and decoder. They carry no information but are crucial for the decoder to work.

Think of them as "scaffolding" — they support the decoding process but add no data.

Rate = K/N = fraction of good channels used

Polar Encoding: Kronecker Product

Mathematics
GN = BN · F⊗n    where   F = [1 0; 1 1]   and   N = 2n

The Kernel Matrix F

The 2×2 matrix F is the fundamental building block. The Kronecker power F⊗n creates the N×N generator matrix.

F⊗2 = [1 0 0 0; 1 1 0 0; 1 0 1 0; 1 1 1 1]

BN is a bit-reversal permutation matrix.

Butterfly Structure

Encoding has a recursive butterfly structure (like FFT):

u0
x0

Each stage: XOR pairs of values. Total: n stages of N/2 XORs each.

Encoding complexity: O(N log N)

The encoding is simply: set frozen positions to 0, place information bits in good positions, then multiply by GN (using the butterfly).

Polar Encoding in Python

Code
import numpy as np

def polar_encode(info_bits, frozen_positions, N):
    """
    Polar encoder using butterfly (Kronecker) structure.

    info_bits: K information bits
    frozen_positions: set of frozen bit indices
    N: code length (must be power of 2)
    """
    n = int(np.log2(N))
    assert N == 2**n, "N must be a power of 2"

    # Place bits: frozen = 0, info = data
    u = np.zeros(N, dtype=int)
    info_idx = 0
    for i in range(N):
        if i not in frozen_positions:
            u[i] = info_bits[info_idx]
            info_idx += 1

    # Butterfly encoding (in-place Kronecker product)
    x = u.copy()
    for stage in range(n):
        step = 2 ** (stage + 1)
        half = step // 2
        for i in range(0, N, step):
            for j in range(half):
                x[i + j] ^= x[i + j + half]
                # x[i + j + half] stays the same

    return x

def construct_polar_code_bec(N, K, epsilon):
    """
    Construct polar code for BEC(epsilon).
    Returns set of frozen positions.
    """
    n = int(np.log2(N))

    # Compute Bhattacharyya parameters recursively
    z = [epsilon]  # Start with single channel
    for stage in range(n):
        new_z = []
        for zi in z:
            z_minus = 2 * zi - zi**2   # Worse channel
            z_plus = zi**2              # Better channel
            new_z.append(z_minus)
            new_z.append(z_plus)
        z = new_z

    # Sort channels by reliability (lower Z = more reliable)
    indexed = [(z[i], i) for i in range(N)]
    indexed.sort()

    # Best K channels carry information; rest are frozen
    frozen = set(idx for _, idx in indexed[K:])
    return frozen, z

# Example: (8, 4) polar code for BEC(0.5)
N, K = 8, 4
frozen, bhatt = construct_polar_code_bec(N, K, epsilon=0.5)
print(f"Frozen positions: {sorted(frozen)}")
print(f"Info positions:   {sorted(set(range(N)) - frozen)}")

info = np.array([1, 0, 1, 1])
codeword = polar_encode(info, frozen, N)
print(f"Info bits:  {info}")
print(f"Codeword:   {codeword}")

Successive Cancellation (SC) Decoding

Mathematics

SC decoding estimates bits one at a time, from u0 to uN-1, using previously decoded bits as side information — mimicking the genie that knows u1, ..., ui-1 in the polarization analysis.

Algorithm

For each bit position i = 0, 1, ..., N-1:

If i is frozen: Set ûi = 0 (known)

If i is information: Compute LLR from received signal y and previously decoded bits û0...ûi-1. Decide: ûi = 0 if LLR ≥ 0, else 1.

Complexity

O(N log N)

Same butterfly structure as encoding, but processing LLRs instead of bits. Like a "reverse FFT" on log-likelihoods.

LLR Recursions

f(a, b) = 2 tanh-1(tanh(a/2) · tanh(b/2))

g(a, b, u) = (-1)u a + b

f-function: combines two LLRs (check-node-like)

g-function: uses a decoded bit to refine (variable-node-like)

SC decoding is sequential — each bit depends on all previous bits. This makes it inherently serial and limits throughput. Also, a single wrong decision propagates errors.

SC List (SCL) Decoding

Mathematics

The Key Improvement

Instead of making a single hard decision at each step, SCL maintains a list of L candidate paths. At each information bit, the list doubles (try both 0 and 1), then prune back to L best paths.

Algorithm

1. Start with one path (all frozen bits known)

2. At frozen bit: extend all paths with known value

3. At info bit: split each path into two (0 and 1)

4. Keep only the L most likely paths (by path metric)

5. After all N bits, select the best surviving path

Performance vs List Size

LPerformanceComplexity
1SC (baseline)O(N log N)
4~0.5 dB gain4x SC
8~0.7 dB gain8x SC
32~1.0 dB gain32x SC
ML boundOptimalExponential

SCL with L=32 approaches maximum-likelihood decoding performance for practical block lengths. The gain over basic SC is substantial — this made polar codes competitive.

CRC-Aided SCL Decoding (CA-SCL)

Mathematics Standards

The breakthrough that made polar codes practical: append a CRC to the information bits before encoding. The CRC helps the list decoder select the correct path.

How It Works

1. Before encoding: compute CRC over K' info bits, append r CRC bits. Total K = K' + r info bits encoded.

2. Decode with SCL (list size L)

3. Among L surviving paths, pick the one that passes the CRC check

4. If no path passes CRC, report decoding failure

Performance Impact

CA-SCL with L=32 and CRC-11:

~1.5 dB better than SC alone

Matches or exceeds turbo/LDPC at short blocks

Reliable error detection via CRC

5G NR: CRC-6 for short blocks
CRC-11 for medium blocks
CRC-24 for long blocks

The CRC bits reduce the effective code rate slightly (e.g., 11 CRC bits out of 128 info bits = 8.6% overhead). But the performance gain far outweighs this cost.

SC Decoder in Python

Code
import numpy as np

def sc_decode(llr, frozen_positions, N):
    """
    Successive Cancellation decoder for polar codes.
    llr: channel LLRs (length N)
    frozen_positions: set of frozen bit indices
    Returns: decoded bits (length N)
    """
    n = int(np.log2(N))
    decoded = np.zeros(N, dtype=int)

    # Recursive SC decode
    def recurse(llrs, node_bits, depth, node_idx):
        if depth == n:
            # Leaf node
            if node_idx in frozen_positions:
                return np.array([0])
            else:
                return np.array([0 if llrs[0] >= 0 else 1])

        half = len(llrs) // 2

        # f-function: combine upper and lower LLRs
        f_llr = np.sign(llrs[:half]) * np.sign(llrs[half:]) * \
                np.minimum(np.abs(llrs[:half]), np.abs(llrs[half:]))

        # Decode left (upper) half
        left = recurse(f_llr, None, depth + 1, 2 * node_idx)

        # g-function: use left decisions to refine
        g_llr = llrs[half:] + (1 - 2 * left) * llrs[:half]

        # Decode right (lower) half
        right = recurse(g_llr, None, depth + 1, 2 * node_idx + 1)

        # Combine: upper = left XOR right, lower = right
        combined = np.concatenate([left ^ right, right])
        return combined

    result = recurse(llr, None, 0, 0)
    return result

# Example: decode (8,4) polar code
N = 8
frozen = {0, 1, 2, 4}  # Example frozen set

# Simulated received LLRs (positive = likely 0)
channel_llr = np.array([2.1, -0.5, 1.3, -1.8, 0.9, -2.3, 1.1, -0.7])
decoded = sc_decode(channel_llr, frozen, N)
print(f"Decoded bits: {decoded}")

Polar vs LDPC vs Turbo

Mathematics
Property Polar (CA-SCL) LDPC Turbo
Capacity achieving? Proven Empirical (near) Empirical (near)
Short blocks (N<256) Excellent Poor Good
Long blocks (N>10K) Good Excellent Good
Decoding latency High (serial SC) Low (parallel BP) Medium
Error floor None (SC) Low (trapping sets) Moderate
Rate flexibility Any K/N (change frozen set) Needs new H matrix Puncturing
Encoding O(N log N), simple O(N) QC, complex O(N), simple

5G NR: Polar Codes for Control Channels

Standards

3GPP selected polar codes for control channels (PDCCH, PUCCH) in 5G NR Release 15 (2018). This was a historic first deployment of polar codes in a major standard.

5G NR Polar Parameters

ChannelPDCCH (DL control)
AlsoPUCCH, PBCH
Info bits K12 to 140
Code length N32 to 512 (max 1024)
CRCCRC-6, CRC-11, CRC-24
DecoderCA-SCL (L=8 typical)

Why Polar for Control?

Short blocks: Control info is 20-140 bits. Polar codes excel here.

Flexible rate: Just change the frozen set — no new code design needed.

Built-in CRC: Error detection comes free with CA-SCL.

No error floor: Critical for control reliability (BLER < 10-5).

The 5G Coding Standards Battle

History Standards

The Contenders (2016-2017)

ChannelCandidatesWinner
Data (long)LDPC, Polar, TurboLDPC
Control (short)Polar, LDPC, TBCCPolar

Turbo codes — the incumbent from 4G — were eliminated early due to throughput limitations at 5G data rates.

The decision was intensely political. Huawei championed polar codes; Qualcomm pushed LDPC. The final split — LDPC for data, polar for control — was a compromise.

Technical Rationale

LDPC for data: superior throughput at long block lengths (1000+ bits), parallel decoding, mature hardware.

Polar for control: superior performance at short block lengths (20-200 bits), flexible rate, no error floor.

Impact

The 5G adoption validated 10 years of polar code research and established them as a mainstream coding family alongside LDPC.

Polar Codes Beyond Channel Coding

Mathematics

Source Coding (Compression)

Polar codes can achieve the source coding bound (lossless compression at entropy rate).

Dual of channel coding: freeze the "good" channels (carry randomness) and compress via the "bad" channels.

Channel coding: info on good channels
Source coding: info on bad channels

Joint Source-Channel Coding

Polar codes can simultaneously compress and protect — achieving the optimal rate-distortion-capacity trade-off.

This unification was one of Arikan's key insights: the same polarization phenomenon governs both compression and error correction.

Wiretap Channels

Polar codes achieve the secrecy capacity of wiretap channels — providing information-theoretic security.

Multiple Access

Polarization extends to multi-user channels, achieving the capacity region of the multiple access channel.

Hardware Implementation Challenges

Standards

SC Decoder

Pro: Simple, low area

Con: Sequential — latency O(N log N). For N=1024, that's ~10,000 clock cycles.

Throughput: limited to ~100 Mbps.

SCL Decoder

Challenge: L parallel SC decoders + sorting/pruning. Area grows linearly with L.

For L=8 in 5G: manageable in modern ASIC processes.

Fast SC Techniques

TechniqueSpeedup
Rate-0 node skip2-3x
Rate-1 node skip2-3x
REP/SPC merge1.5x
Unrolled decoder5-10x
Combined10-30x

Fast-SSC (Simplified SC) identifies subtree patterns (all-frozen, all-info) and decodes them in one step.

Modern fast-SCL decoders achieve 1+ Gbps throughput — sufficient for 5G control channel requirements. Research into BP decoding of polar codes continues for higher throughput.

Rate Matching for Polar Codes

Mathematics Standards

The Challenge

Polar codes require N = 2n. But 5G needs arbitrary output lengths M. Solution: encode at N ≥ M, then rate-match to M bits.

Puncturing

M < N: Delete N-M coded bits. Receiver treats punctured positions as erasures (LLR = 0).

Used when M is slightly less than N.

Shortening

M < N: Set N-M input bits to known values. Receiver knows these bits perfectly (LLR = ∞).

Better than puncturing when M << N.

Repetition

M > N: Repeat some coded bits. Receiver combines repeated copies for improved LLRs.

Used for very low rates.

5G NR uses a combination of these techniques with interleaving to support any desired output length and code rate. The frozen set must be adapted for the rate-matching scheme.

Extensions and Future Directions

Mathematics

Non-Binary Polar Codes

Polarization works over any finite field GF(q). Non-binary kernels can achieve faster polarization — fewer "middle" channels at finite N.

Trade-off: better performance vs higher decoding complexity.

Larger Kernels

Replace the 2×2 kernel with larger matrices (3×3, 4×4, ...). Larger kernels polarize faster — better finite-length performance.

Polarization exponent β:
2×2: β = 0.5
Large kernel: β → 1

PAC Codes

Polarization-Adjusted Convolutional codes (Arikan, 2019): pre-transform the data with a convolutional code before polar encoding. Remarkable short-block performance.

Quantum Polar Codes

Polar codes have been extended to quantum error correction, achieving the quantum channel capacity. An active research area.

Polar Code Design: End-to-End

Mathematics Code
Choose
N, K, R
Channel
estimation
Compute
Z(W(i))
Select frozen
positions
Add CRC
to info bits
Place in
info positions
Butterfly
encode
Rate
match
Transmit
Receive
Rate de-
match
CA-SCL
decode
CRC
check
Decoded
info bits

This is exactly the flow used in 5G NR for control channel encoding/decoding. The entire process is defined in 3GPP TS 38.212.

Polar Codes: Summary

History Standards

Timeline

2009Arikan's foundational paper
2011SC List decoding (Tal & Vardy)
2012CRC-aided SCL
2015Fast-SSC decoding
20185G NR standardization
2019PAC codes (Arikan)

Key Contributions

First provably capacity-achieving codes with efficient encoding/decoding

Elegant theory: one simple transform, recursive structure

Excellent short-block performance with CA-SCL

Deployed in the world's most advanced wireless standard

Polar codes closed a 60-year gap between Shannon's promise and practical reality. They represent the most significant theoretical advance in coding since turbo codes, and arguably since Shannon himself.