Presentation 03

AES

Design & Implementation

FIPS 197 · Rijndael · GF(28) Arithmetic · S-box Construction
Key Schedule · Hardware Architectures · Side-Channel Countermeasures

→ / ←  navigate   Esc  overview   F  fullscreen   ?print-pdf  export

Historical Context

The Road to AES

1977 — DES published (FIPS 46), 56-bit key

1997 — NIST calls for AES candidates

1998 — 15 submissions from 12 countries

2000 — Rijndael selected as winner

2001 — FIPS 197 published

2003 — NSA approves for classified data

Why Rijndael Won

✓ Clean algebraic structure — easy to analyze

✓ Best combination of security + performance

✓ Efficient on 8-bit µC through 64-bit CPUs

✓ Low memory footprint

✓ Parallelizable — suited to pipelining

✓ Simple key schedule

Designed by Joan Daemen & Vincent Rijmen
Katholieke Universiteit Leuven, Belgium

FIPS 197 (2001) · NIST AES Selection Report · Daemen & Rijmen, "The Design of Rijndael" (Springer, 2002)

AES at a Glance

Block size: 128 bits   Key sizes: 128 / 192 / 256 bits   Rounds: 10 / 12 / 14

Substitution–Permutation Network (SPN)

Each round applies four transformations to a 4×4 byte State matrix:

SubBytes
ShiftRows
MixColumns
AddRoundKey

Final round omits MixColumns.

Three Design Layers

Non-linear layer — SubBytes (S-box)
Provides confusion via GF(28) inversion

Linear diffusion layer — ShiftRows + MixColumns
Provides diffusion via byte transposition & MDS mixing

Key mixing layer — AddRoundKey
XOR of state with round key

The State Matrix

AES operates on a 4×4 array of bytes arranged in column-major order.

128-bit Input Block

in0 in1 in2 … in15

↓ maps to columns ↓

S0,0
S0,1
S0,2
S0,3
S1,0
S1,1
S1,2
S1,3
S2,0
S2,1
S2,2
S2,3
S3,0
S3,1
S3,2
S3,3

Sr,c = in[r + 4c]

Column-Major Layout

in[0] in[4] in[8] in[12]
in[1] in[5] in[9] in[13]
in[2] in[6] in[10] in[14]
in[3] in[7] in[11] in[15]

Each column is a 32-bit word — this is critical for efficient 32-bit CPU implementations using T-tables.

The state is transformed round-by-round and written back in column-major order to produce the ciphertext.

Round Structure

State = Plaintext

// Initial round key addition

AddRoundKey(State, Key0)

 

// Main rounds 1 … Nr−1

for r = 1 to Nr − 1:

  SubBytes(State)

  ShiftRows(State)

  MixColumns(State)

  AddRoundKey(State, Keyr)

 

// Final round (no MixColumns)

SubBytes(State)

ShiftRows(State)

AddRoundKey(State, KeyNr)

 

Ciphertext = State

Key LengthRounds (Nr)Round Keys
128 bits1011
192 bits1213
256 bits1415

GF(28) Arithmetic Recap

All AES byte operations live in the finite field GF(28) defined by the irreducible polynomial:

m(x) = x8 + x4 + x3 + x + 1   (0x11B)

Addition

Coefficient-wise XOR

a(x) ⊕ b(x)

No carries — each bit is independent.

Multiplication

Polynomial multiply then reduce mod m(x)

a(x) · b(x) mod m(x)

Efficiently implemented via xtime() — left-shift + conditional XOR with 0x1B.

xtime() — Multiply by x

def xtime(a):
    """Multiply by x in GF(2^8)"""
    result = a << 1
    if result & 0x100:      # overflow?
        result ^= 0x11B    # reduce mod m(x)
    return result & 0xFF

Multiplicative Inverse

Every non-zero element has a unique inverse: a−1 such that a · a−1 = 1

By Fermat's little theorem: a−1 = a254 in GF(28)

Or use the Extended Euclidean Algorithm.

FIPS 197 §4.2 · Presentation 01 covers GF(28) in full depth

SubBytes — S-box Construction

The only non-linear operation in AES. Each byte is independently transformed.

Two-Step Construction

Step 1 Multiplicative inverse in GF(28)

b = a−1   (with 0−1 defined as 0)

Provides non-linearity — max algebraic degree.

Step 2 Affine transformation over GF(2)

bi' = bi ⊕ b(i+4)mod8 ⊕ b(i+5)mod8
   ⊕ b(i+6)mod8 ⊕ b(i+7)mod8 ⊕ ci

where c = 0x63 = 01100011₂

The affine step breaks the pure algebraic structure of the inverse, preventing interpolation attacks.

S-box Properties

Non-linearity: 112 (maximum for 8-bit bijection)

Algebraic degree: 7

Differential uniformity: 4

Fixed points: 0 (no byte maps to itself)

No opposite fixed points

Example

S(0x00) = 0x63   S(0x01) = 0x7C
S(0x53) = 0xED   S(0xFF) = 0x16

Input 0x53: inverse in GF(28) = 0xCA, then affine → 0xED

The AES S-box (complete lookup table)

.0.1.2.3.4.5.6.7.8.9.a.b.c.d.e.f
0.637c777bf26b6fc53001672bfed7ab76
1.ca82c97dfa5947f0add4a2af9ca472c0
2.b7fd9326363ff7cc34a5e5f171d83115
3.04c723c31896059a071280e2eb27b275
4.09832c1a1b6e5aa0523bd6b329e32f84
5.53d100ed20fcb15b6acbbe394a4c58cf
6.d0efaafb434d338545f9027f503c9fa8
7.51a3408f929d38f5bcb6da2110fff3d2
8.cd0c13ec5f974417c4a77e3d645d1973
9.60814fdc222a908846eeb814de5e0bdb
a.e0323a0a4906245cc2d3ac629195e479
b.e7c8376d8dd54ea96c56f4ea657aae08
c.ba78252e1ca6b4c6e8dd741f4bbd8b8a
d.703eb5664803f60e613557b986c11d9e
e.e1f8981169d98e949b1e87e9ce5528df
f.8ca1890dbfe6426841992d0fb054bb16

Read as: S(0xRC) → row R, column C. Example: S(0x53) = row 5, col 3 → ED

ShiftRows

A byte-level transposition that shifts each row cyclically to the left.

Before ShiftRows

S₀₀
S₀₁
S₀₂
S₀₃
S₁₀
S₁₁
S₁₂
S₁₃
S₂₀
S₂₁
S₂₂
S₂₃
S₃₀
S₃₁
S₃₂
S₃₃

After ShiftRows

S₀₀
S₀₁
S₀₂
S₀₃
S₁₁
S₁₂
S₁₃
S₁₀
S₂₂
S₂₃
S₂₀
S₂₁
S₃₃
S₃₀
S₃₁
S₃₂
RowShift (bytes left)
Row 00 (no shift)
Row 11
Row 22
Row 33

Purpose: Prevents each column from being encrypted independently —
ensures bytes from different columns interact in MixColumns.

MixColumns

Each column is treated as a polynomial over GF(28) and multiplied by a fixed MDS polynomial.

Matrix Multiplication in GF(28)

02030101
01020301
01010203
03010102
·
S0
S1
S2
S3

Only uses coefficients 01, 02, 03:

×01 = identity

×02 = xtime(a)

×03 = xtime(a) ⊕ a

MDS Property

Branch number = 5 (maximum possible for 4×4)

A difference in 1 input byte propagates to all 4 output bytes.

Any 2-byte input difference → at least 3-byte output difference.

Inverse MixColumns

For decryption, use the inverse matrix with coefficients:

0E, 0B, 0D, 09

These are more expensive to compute — a key factor in hardware design.

The polynomial c(x) = {03}x³ + {01}x² + {01}x + {02} is coprime to x⁴ + 1, guaranteeing invertibility.

AddRoundKey

The simplest transformation: bitwise XOR of the state with the round key.

State'i,j = Statei,j Keyi,j

Properties

✓ XOR is its own inverse — same for encrypt/decrypt

✓ Zero-cost in hardware — just wire connections

✓ Introduces the secret key into every round

✓ Without it, the cipher would be a public permutation

Round Key Derivation

Each 128-bit round key is drawn from the expanded key schedule:

Keyr = W[4r] ‖ W[4r+1] ‖ W[4r+2] ‖ W[4r+3]

where W[] is the expanded key array (44 words for AES-128)

Efficient 32-bit Implementation

On a 32-bit CPU, AddRoundKey is 4 XOR operations (one per column-word).

state[c] ^= roundkey[c]; // c = 0..3

Key Schedule (AES-128)

Expands the 128-bit cipher key into 44 × 32-bit words (11 round keys).

Key Expansion Algorithm

def key_expansion(key):
    W = [0] * 44
    # First 4 words = cipher key
    for i in range(4):
        W[i] = key_word(key, i)

    for i in range(4, 44):
        temp = W[i - 1]
        if i % 4 == 0:
            temp = sub_word(rot_word(temp))
            temp ^= RCON[i // 4]
        W[i] = W[i - 4] ^ temp
    return W

Core Functions

RotWord — Rotate 4 bytes left by 1 position

[a₀ a₁ a₂ a₃] → [a₁ a₂ a₃ a₀]

SubWord — Apply S-box to each of 4 bytes

[a₀ a₁ a₂ a₃] → [S(a₀) S(a₁) S(a₂) S(a₃)]

Rcon[i] — Round constant

= [xi−1, 00, 00, 00] in GF(28)

01, 02, 04, 08, 10, 20, 40, 80, 1B, 36

Design rationale: Rcon prevents symmetry between round keys. SubWord introduces non-linearity. Simple XOR-chain propagates change across all words.

Key Schedule — AES-192 & AES-256

AES-192 (Nk = 6)

Expands 192-bit key into 52 words (13 round keys)

Non-linear function (SubWord + RotWord + Rcon) applied every 6th word:

if i mod 6 == 0: apply g()

Otherwise: simple XOR chain

AES-256 (Nk = 8)

Expands 256-bit key into 60 words (15 round keys)

Extra SubWord applied at position 4 within each 8-word group:

if i mod 8 == 0: apply g()
if i mod 8 == 4: SubWord only

This extra non-linear step was added to resist related-key attacks.

VariantKey Words (Nk)Expanded WordsRound Keysg() applied every
AES-128444114 words
AES-192652136 words
AES-256860158 words (+ SubWord at 4)

Decryption — Inverse Cipher

AES decryption applies the inverse of each operation in reverse order.

Direct Inverse

AddRoundKey(State, KeyNr)

for r = Nr−1 downto 1:

InvShiftRows(State)

InvSubBytes(State)

AddRoundKey(State, Keyr)

InvMixColumns(State)

InvShiftRows(State)

InvSubBytes(State)

AddRoundKey(State, Key0)

Equivalent Inverse Cipher

Key insight: By reordering InvSubBytes↔InvShiftRows and InvMixColumns↔AddRoundKey, the inverse cipher has the same structure as encryption.

This requires applying InvMixColumns to round keys 1…Nr−1 during key expansion.

Inverse Operations

EncryptDecryptCost change
SubBytesInvSubBytesSame (separate LUT)
ShiftRowsInvShiftRowsShift right
MixColumns ×{02,03}InvMixColumns ×{09,0B,0D,0E}~3× more expensive
AddRoundKeyAddRoundKeyXOR = self-inverse

T-Table Optimisation (Software)

On 32-bit processors, SubBytes + ShiftRows + MixColumns can be fused into four 256-entry 32-bit lookup tables.

T-box Definition

T₀[a] = [ 02·S(a), S(a), S(a), 03·S(a) ]
T₁[a] = [ 03·S(a), 02·S(a), S(a), S(a) ]
T₂[a] = [ S(a), 03·S(a), 02·S(a), S(a) ]
T₃[a] = [ S(a), S(a), 03·S(a), 02·S(a) ]

One Column of Output

ej = T₀[a0,j] ⊕ T₁[a1,j+1] ⊕ T₂[a2,j+2] ⊕ T₃[a3,j+3] ⊕ Kj

Performance

MetricS-boxT-table
Tables1 × 256 B4 × 1 KB
Total memory256 B4 KB
Lookups/round1616
XORs/roundmany16 (32-bit)
Ops per round~160~20

⚠ Cache-timing vulnerability

Table access patterns leak key-dependent indices. Bernstein (2005) demonstrated full AES key recovery from timing measurements alone.

AES-NI — Hardware Instructions

Intel/AMD processors include dedicated AES instructions since Westmere (2010).

Instruction Set

InstructionOperation
AESENCOne encryption round
AESENCLASTFinal encryption round
AESDECOne decryption round
AESDECLASTFinal decryption round
AESKEYGENASSISTKey schedule helper
AESIMCInvMixColumns (for EqInv keys)

Performance Impact

AESENC — single cycle latency, fully pipelined

With AES-NI, AES-128-CTR achieves ~4 cycles/byte on modern x86.

That's ~8 GB/s single-core at 4 GHz.

Security Benefits

✓ Constant-time execution — immune to cache-timing attacks

✓ No S-box/T-table in memory — no table access patterns

✓ S-box computed in hardware via GF(24)² composite field

Intel® Advanced Encryption Standard Instructions (AES-NI) White Paper (2010) · ARM Cryptographic Extension (ARMv8-A, 2011)

Hardware Architectures

AES hardware spans a wide design space from compact IoT to high-throughput network processors.

Basic Iterative

Single round, reused Nr times

Area: ~3,000–5,000 GE

Throughput: ~1–3 Gbps

Latency: 10–14 cycles

Ideal for area-constrained IoT & smart cards

Inner-Round Pipelined

Pipeline registers within a single round

Area: ~8,000–15,000 GE

Throughput: ~5–15 Gbps

Latency: 20–40 cycles

Balances area and speed

Fully Unrolled + Pipelined

All 10/14 rounds instantiated in hardware

Area: ~50,000–170,000 GE

Throughput: ~30–53 Gbps

Latency: 10–14 cycles

Maximum throughput for network & SoC

ArchitecturePlatformAreaThroughputEfficiency (Gbps/kGE)
8-bit serial180nm ASIC~3.1 kGE~80 Mbps0.026
Iterative 128-bitVirtex-5 FPGA1,364 slices3.2 Gbps
Sub-pipelinedVirtex-6 FPGA2,100 slices12.8 Gbps
Composite-field (Mathew et al.)45nm ASIC56 kGE53 Gbps0.946

Mathew et al., "53 Gbps AES" IEEE JSSC 2011 · Chodowiec & Gaj, "Very Compact FPGA AES" (2003)

S-box Hardware — Composite Fields

The S-box is the most expensive block in AES hardware. Two main approaches exist.

LUT-Based (ROM/BRAM)

Store the 256×8-bit table directly:

✓ Simple — direct memory lookup

✓ FPGA: fits in one 18Kb BRAM

✗ 16 S-boxes needed → 16 BRAMs per round

✗ Larger area on ASIC

✗ Susceptible to power side-channels

Composite-Field (GF((2⁴)²))

Decompose GF(28) into tower field GF((24)²):

Inversion in GF(28) → operations in GF(24)

✓ ~20% fewer gates (Canright, 2005)

✓ Used in Intel AES-NI hardware

✓ Better for ASIC — pure logic

✗ More complex design verification

Canright's S-box: Only 92 GE per S-box using GF(2²) subfield decomposition — the most compact known.

Tower Field Decomposition

GF(28) → isomorphic to GF((24)2) → further to GF(((22)2)2)
Each inversion reduces to multiplications and squarings in progressively smaller subfields, eventually reaching GF(2²) where inversion is just XOR + AND gates.

Canright, "A Very Compact Rijndael S-box" (2005) · Mathew et al., IEEE JSSC (2011)

Modes of Operation

AES is a block cipher (128-bit). To encrypt arbitrary-length data, use a mode of operation.

ECB

Each block encrypted independently

✗ Deterministic — reveals patterns

✗ Never use for real data

CBC

Each block XORed with previous ciphertext

✓ Widely deployed (TLS ≤1.2)

✗ Sequential — can't parallelize encryption

✗ Padding oracle attacks

CTR

Encrypt counter → XOR with plaintext

✓ Fully parallelizable

✓ Encryption-only hardware needed

✗ No integrity protection

GCM (Galois/Counter Mode) ★

CTR encryption + GHASH authentication

✓ Authenticated encryption (AEAD)

✓ Fully parallelizable

✓ Hardware-friendly — GHASH uses GF(2128)

The standard choice for TLS 1.3, IPsec, SSH

Other Notable Modes

XTS — Tweakable, used for disk encryption

CCM — CTR + CBC-MAC, used in Wi-Fi (WPA2/3)

SIV — Nonce-misuse resistant AEAD

GCM-SIV — Nonce-misuse resistant GCM variant

Side-Channel Attacks on AES

Attack Vectors

TIMING

T-table cache access patterns reveal key-dependent indices. Bernstein's attack (2005) recovers full AES-256 key from network timing.

POWER / EM

Differential Power Analysis (DPA) — correlate power traces with hypothetical S-box outputs across many encryptions.

Target: first or last round SubBytes, where known plaintext/ciphertext meets key.

FAULT

Differential Fault Analysis (DFA) — inject faults in round 8/9, compare correct/faulty ciphertexts → recover key.

A single byte fault in round 9 can reduce key search to ~232.

Key Targets

The S-box is the primary target because:

1. Only non-linear operation → strongest correlation with power

2. Operates byte-by-byte → divide-and-conquer (16 × 28 vs 2128)

3. T-table indices directly reveal key bytes

Combined attacks exploit both fault injection and power analysis simultaneously, defeating higher-order masking countermeasures.

Roche et al. (2011), Dassance & Venelli (2012)

Cache attacks — Flush+Reload, Prime+Probe on shared L3 cache can extract AES keys across VMs.

Side-Channel Countermeasures

Software Countermeasures

MASKING

Split every sensitive variable into d+1 random shares such that x = x₁ ⊕ x₂ ⊕ … ⊕ xd+1

dth-order masking requires d+1 shares; attacker must combine d+1 trace points.

SHUFFLING

Randomize the order of S-box computations within a round (16 S-box calls can be permuted in 16! ways).

BITSLICING

Compute S-box as Boolean circuit — constant time, no table lookups. 32 AES blocks in parallel on 32-bit CPU.

Hardware Countermeasures

DUAL-RAIL LOGIC

Each bit represented as complementary pair (a, ā). Constant Hamming weight per operation.

THRESHOLD IMPLEMENTATIONS

Hardware secret sharing with provable glitch resistance. Each share processed by independent logic.

REDUNDANCY (DFA defense)

Temporal: encrypt twice, compare
Spatial: TMR (triple modular redundancy)
Inverse check: decrypt ciphertext, compare with plaintext

NOISE INJECTION

Random delays, dummy operations, clock jitter to decorrelate power traces.

Python Implementation (AES-128)

SBOX = [
    0x63,0x7c,0x77,0x7b,0xf2,0x6b,0x6f,0xc5,0x30,0x01,0x67,0x2b,0xfe,0xd7,0xab,0x76,
    0xca,0x82,0xc9,0x7d,0xfa,0x59,0x47,0xf0,0xad,0xd4,0xa2,0xaf,0x9c,0xa4,0x72,0xc0,
    # ... full 256-entry table ...
]
RCON = [0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80,0x1b,0x36]

def xtime(a):            return ((a << 1) ^ 0x1B) & 0xFF if a & 0x80 else (a << 1) & 0xFF
def gf_mul(a, b):        # multiply in GF(2^8) via repeated xtime
    p = 0
    for _ in range(8):
        if b & 1: p ^= a
        a = xtime(a); b >>= 1
    return p

def sub_bytes(state):    return [SBOX[b] for b in state]
def shift_rows(s):       return [s[0],s[5],s[10],s[15], s[4],s[9],s[14],s[3],
                                 s[8],s[13],s[2],s[7],   s[12],s[1],s[6],s[11]]
def mix_columns(s):
    r = []
    for c in range(4):
        i = c * 4
        r += [gf_mul(2,s[i])^gf_mul(3,s[i+1])^s[i+2]^s[i+3],
              s[i]^gf_mul(2,s[i+1])^gf_mul(3,s[i+2])^s[i+3],
              s[i]^s[i+1]^gf_mul(2,s[i+2])^gf_mul(3,s[i+3]),
              gf_mul(3,s[i])^s[i+1]^s[i+2]^gf_mul(2,s[i+3])]
    return r

def add_round_key(s, k):  return [a ^ b for a, b in zip(s, k)]

def aes_encrypt(plaintext, key):  # both 16-byte lists
    state = add_round_key(plaintext, key[:16])
    rkeys = key_expansion(key)     # returns list of 11 × 16-byte round keys
    for r in range(1, 10):
        state = sub_bytes(state)
        state = shift_rows(state)
        state = mix_columns(state)
        state = add_round_key(state, rkeys[r])
    state = sub_bytes(state)
    state = shift_rows(state)
    state = add_round_key(state, rkeys[10])
    return state

SystemVerilog — AES Round

module aes_round (
    input  logic [127:0] state_in,
    input  logic [127:0] round_key,
    input  logic         last_round,  // skip MixColumns if high
    output logic [127:0] state_out
);

    logic [127:0] after_sub, after_shift, after_mix;

    // ---- SubBytes: 16 parallel S-box instances ----
    genvar i;
    generate
        for (i = 0; i < 16; i++) begin : g_sbox
            aes_sbox u_sbox (
                .in  (state_in[8*i +: 8]),
                .out (after_sub[8*i +: 8])
            );
        end
    endgenerate

    // ---- ShiftRows: wire-level byte permutation ----
    assign after_shift = {
        after_sub[127:120], after_sub[ 87: 80], after_sub[ 47: 40], after_sub[  7:  0],
        after_sub[ 95: 88], after_sub[ 55: 48], after_sub[ 15:  8], after_sub[103: 96],
        after_sub[ 63: 56], after_sub[ 23: 16], after_sub[111:104], after_sub[ 71: 64],
        after_sub[ 31: 24], after_sub[119:112], after_sub[ 79: 72], after_sub[ 39: 32]
    };

    // ---- MixColumns: 4 column mixers ----
    generate
        for (i = 0; i < 4; i++) begin : g_mix
            aes_mix_column u_mix (
                .col_in  (after_shift[32*i +: 32]),
                .col_out (after_mix[32*i +: 32])
            );
        end
    endgenerate

    // ---- AddRoundKey ----
    assign state_out = (last_round ? after_shift : after_mix) ^ round_key;

endmodule

Performance Landscape

PlatformImplementationThroughputLatencyNotes
8-bit AVRByte-serial, table S-box ~1.5 Mbps~2,700 cyclesIoT / smart cards
ARM Cortex-M4Bitsliced ~45 Mbps~480 cyclesEmbedded
x86 (no AES-NI)T-table, OpenSSL ~3.2 Gbps~160 cyclesSoftware, 4 GHz
x86 (AES-NI)AESENC pipeline ~8 Gbps~40 cyclesSingle-core CTR
Virtex-5 FPGAIterative 128-bit ~3.2 Gbps11 cycles1,364 slices
Virtex-7 FPGAFull pipeline ~24 Gbps11 cycles~4,200 slices
45nm ASICComposite S-box, full pipeline ~53 Gbps10 cycles56 kGE, Mathew et al.
CUDA GPUParallel ECB/CTR ~135 GbpsRTX 4090, batch mode
From smart card (1.5 Mbps) to GPU (135 Gbps) — a 90,000× throughput range. AES was designed for this versatility.

Cryptanalysis Status (2025)

Best Known Attacks

Biclique attack (Bogdanov et al., 2011)

AES-128: 2126.1   AES-192: 2189.7   AES-256: 2254.4

Negligible improvement over brute force — 3–4 bit gain.

Related-key attacks (Biryukov & Khovratovich, 2009)

AES-256: 299.5 (chosen related keys)

Impractical — requires attacker to choose related keys. Addressed in key schedule design.

Square / integral attacks

Effective up to 6–7 rounds. Full AES (10+ rounds) has comfortable security margin.

Security Assessment

AES remains unbroken

No practical attack reducing security below the intended level exists as of 2025.

Post-Quantum Consideration

Grover's algorithm reduces brute-force to 2n/2:

AES-128 → ~264 quantum ops

AES-256 → ~2128 quantum ops

Recommendation: use AES-256 for quantum-resistant symmetric encryption.

CNSA 2.0 (NSA) mandates AES-256.

Where AES Is Deployed

Network Security

TLS 1.3 (AES-128/256-GCM)

IPsec / IKEv2

SSH

WPA3 (AES-CCM, AES-GCMP)

Signal Protocol (AES-256-CBC)

Storage Encryption

BitLocker (AES-XTS)

FileVault 2 (AES-XTS-128)

LUKS / dm-crypt

Android FBE

Self-Encrypting Drives (TCG Opal)

Hardware / SoC

Intel AES-NI (Westmere+)

ARM Crypto Extensions (v8-A+)

RISC-V Zkn extension

Qualcomm Inline Crypto Engine

Apple Secure Enclave

AES encrypts virtually all internet traffic, disk storage, and mobile communications worldwide. It is the most widely deployed symmetric cipher in history.

Design Philosophy

Daemen & Rijmen's three core principles:

🔬

Resistance to Attacks

S-box from GF(28) inverse — optimal non-linearity. MDS matrix for max diffusion. Wide trail strategy provides provable bounds against differential and linear cryptanalysis.

Performance & Efficiency

T-table fusion for 32-bit CPUs. Byte-oriented for 8-bit µC. Wire-only ShiftRows. Simple xtime() for MixColumns. Parallelizable key schedule.

📐

Simplicity & Transparency

No secret design constants — everything derived from mathematical first principles. Simple algebraic structure enables thorough analysis and prevents suspicion of trapdoors.

Daemen & Rijmen, "The Design of Rijndael" (Springer, 2002) ch. 5–9

References

Standards

FIPS 197 — Advanced Encryption Standard (AES), NIST, 2001

NIST SP 800-38A — Modes of Operation

NIST SP 800-38D — GCM Recommendation

CNSA 2.0 — NSA Quantum-Resistant Algorithm Suite

Design & Theory

Daemen & Rijmen, "The Design of Rijndael" (Springer, 2002)

Daemen & Rijmen, AES Proposal: Rijndael v2 (1999)

Shannon, "Communication Theory of Secrecy Systems" (1949)

Hardware

Canright, "A Very Compact Rijndael S-box" (2005)

Mathew et al., "53 Gbps AES in 45nm" IEEE JSSC (2011)

Gaj & Chodowiec, "FPGA and ASIC Implementations of AES" in Cryptographic Engineering (2009)

Cryptanalysis & Side-Channels

Bogdanov et al., "Biclique Cryptanalysis of AES" (ASIACRYPT 2011)

Biryukov & Khovratovich, "Related-Key Attacks on AES-256" (CRYPTO 2009)

Bernstein, "Cache-Timing Attacks on AES" (2005)

Kocher et al., "Differential Power Analysis" (CRYPTO 1999)

Part of the Modern Cryptography Presentation Series