Presentation 03

AES

Design & Implementation

FIPS 197 · Rijndael · GF(2⁸) Arithmetic · S-box Construction
Key Schedule · Hardware Architectures · Side-Channel Countermeasures

→ / ← navigate Esc overview F fullscreen ?print-pdf export

Historical Context

The Road to AES

1977 — DES published (FIPS 46), 56-bit key

1997 — NIST calls for AES candidates

1998 — 15 submissions from 12 countries

2000 — Rijndael selected as winner

2001 — FIPS 197 published

2003 — NSA approves for classified data

Why Rijndael Won

✓ Clean algebraic structure — easy to analyze

✓ Best combination of security + performance

✓ Efficient on 8-bit µC through 64-bit CPUs

✓ Low memory footprint

✓ Parallelizable — suited to pipelining

✓ Simple key schedule

FIPS 197 (2001) · NIST AES Selection Report · Daemen & Rijmen, "The Design of Rijndael" (Springer, 2002)

AES at a Glance

    Block size: 128 bits   Key sizes: 128 / 192 / 256 bits   Rounds: 10 / 12 / 14
  

Substitution–Permutation Network (SPN)

Each round applies four transformations to a 4×4 byte State matrix:

SubBytes

→

ShiftRows

→

MixColumns

→

AddRoundKey

Final round omits MixColumns.

Three Design Layers

Non-linear layer — SubBytes (S-box)
Provides confusion via GF(2⁸) inversion

Linear diffusion layer — ShiftRows + MixColumns
Provides diffusion via byte transposition & MDS mixing

Key mixing layer — AddRoundKey
XOR of state with round key

The State Matrix

AES operates on a 4×4 array of bytes arranged in column-major order.

128-bit Input Block

in₀ in₁ in₂ … in₁₅

↓ maps to columns ↓

S_0,0

S_0,1

S_0,2

S_0,3

S_1,0

S_1,1

S_1,2

S_1,3

S_2,0

S_2,1

S_2,2

S_2,3

S_3,0

S_3,1

S_3,2

S_3,3

S_r,c = in[r + 4c]

Column-Major Layout

in[0] in[4] in[8] in[12]
in[1] in[5] in[9] in[13]
in[2] in[6] in[10] in[14]
in[3] in[7] in[11] in[15]

Each column is a 32-bit word — this is critical for efficient 32-bit CPU implementations using T-tables.

The state is transformed round-by-round and written back in column-major order to produce the ciphertext.

Round Structure

State = Plaintext

// Initial round key addition

AddRoundKey(State, Key₀)

// Main rounds 1 … N_r−1

for r = 1 to N_r − 1:

SubBytes(State)

ShiftRows(State)

MixColumns(State)

AddRoundKey(State, Key_r)

// Final round (no MixColumns)

SubBytes(State)

ShiftRows(State)

AddRoundKey(State, Key_{N_r})

Ciphertext = State

Key Length	Rounds (N_r)	Round Keys
128 bits	10	11
192 bits	12	13
256 bits	14	15

GF(2⁸) Arithmetic Recap

All AES byte operations live in the finite field GF(2⁸) defined by the irreducible polynomial:

m(x) = x⁸ + x⁴ + x³ + x + 1 (0x11B)

Addition

Coefficient-wise XOR

a(x) ⊕ b(x)

No carries — each bit is independent.

Multiplication

Polynomial multiply then reduce mod m(x)

a(x) · b(x) mod m(x)

Efficiently implemented via xtime() — left-shift + conditional XOR with 0x1B.

xtime() — Multiply by x

def xtime(a):
    """Multiply by x in GF(2^8)"""
    result = a << 1
    if result & 0x100:      # overflow?
        result ^= 0x11B    # reduce mod m(x)
    return result & 0xFF

Multiplicative Inverse

Every non-zero element has a unique inverse: a⁻¹ such that a · a⁻¹ = 1

By Fermat's little theorem: a⁻¹ = a²⁵⁴ in GF(2⁸)

Or use the Extended Euclidean Algorithm.

FIPS 197 §4.2 · Presentation 01 covers GF(2⁸) in full depth

SubBytes — S-box Construction

The only non-linear operation in AES. Each byte is independently transformed.

Two-Step Construction

Step 1 Multiplicative inverse in GF(2⁸)

b = a⁻¹ (with 0⁻¹ defined as 0)

Provides non-linearity — max algebraic degree.

Step 2 Affine transformation over GF(2)

b_i' = b_i ⊕ b_(i+4)mod8 ⊕ b_(i+5)mod8
⊕ b_(i+6)mod8 ⊕ b_(i+7)mod8 ⊕ c_i

where c = 0x63 = 01100011₂

The affine step breaks the pure algebraic structure of the inverse, preventing interpolation attacks.

S-box Properties

Non-linearity: 112 (maximum for 8-bit bijection)

Algebraic degree: 7

Differential uniformity: 4

Fixed points: 0 (no byte maps to itself)

No opposite fixed points

Example

S(0x00) = 0x63 S(0x01) = 0x7C
S(0x53) = 0xED S(0xFF) = 0x16

Input 0x53: inverse in GF(2⁸) = 0xCA, then affine → 0xED

The AES S-box (complete lookup table)

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.a	.b	.c	.d	.e	.f
0.	63	7c	77	7b	f2	6b	6f	c5	30	01	67	2b	fe	d7	ab	76
1.	ca	82	c9	7d	fa	59	47	f0	ad	d4	a2	af	9c	a4	72	c0
2.	b7	fd	93	26	36	3f	f7	cc	34	a5	e5	f1	71	d8	31	15
3.	04	c7	23	c3	18	96	05	9a	07	12	80	e2	eb	27	b2	75
4.	09	83	2c	1a	1b	6e	5a	a0	52	3b	d6	b3	29	e3	2f	84
5.	53	d1	00	ed	20	fc	b1	5b	6a	cb	be	39	4a	4c	58	cf
6.	d0	ef	aa	fb	43	4d	33	85	45	f9	02	7f	50	3c	9f	a8
7.	51	a3	40	8f	92	9d	38	f5	bc	b6	da	21	10	ff	f3	d2
8.	cd	0c	13	ec	5f	97	44	17	c4	a7	7e	3d	64	5d	19	73
9.	60	81	4f	dc	22	2a	90	88	46	ee	b8	14	de	5e	0b	db
a.	e0	32	3a	0a	49	06	24	5c	c2	d3	ac	62	91	95	e4	79
b.	e7	c8	37	6d	8d	d5	4e	a9	6c	56	f4	ea	65	7a	ae	08
c.	ba	78	25	2e	1c	a6	b4	c6	e8	dd	74	1f	4b	bd	8b	8a
d.	70	3e	b5	66	48	03	f6	0e	61	35	57	b9	86	c1	1d	9e
e.	e1	f8	98	11	69	d9	8e	94	9b	1e	87	e9	ce	55	28	df
f.	8c	a1	89	0d	bf	e6	42	68	41	99	2d	0f	b0	54	bb	16

Read as: S(0xRC) → row R, column C. Example: S(0x53) = row 5, col 3 → ED

ShiftRows

A byte-level transposition that shifts each row cyclically to the left.

Before ShiftRows

S₀₀

S₀₁

S₀₂

S₀₃

S₁₀

S₁₁

S₁₂

S₁₃

S₂₀

S₂₁

S₂₂

S₂₃

S₃₀

S₃₁

S₃₂

S₃₃

→

After ShiftRows

S₀₀

S₀₁

S₀₂

S₀₃

S₁₁

S₁₂

S₁₃

S₁₀

S₂₂

S₂₃

S₂₀

S₂₁

S₃₃

S₃₀

S₃₁

S₃₂

Row	Shift (bytes left)
Row 0	0 (no shift)
Row 1	1
Row 2	2
Row 3	3

Purpose: Prevents each column from being encrypted independently —
ensures bytes from different columns interact in MixColumns.

MixColumns

Each column is treated as a polynomial over GF(2⁸) and multiplied by a fixed MDS polynomial.

Matrix Multiplication in GF(2⁸)

02	03	01	01
01	02	03	01
01	01	02	03
03	01	01	02

S₀

S₁

S₂

S₃

Only uses coefficients 01, 02, 03:

×01 = identity

×02 = xtime(a)

×03 = xtime(a) ⊕ a

MDS Property

Branch number = 5 (maximum possible for 4×4)

A difference in 1 input byte propagates to all 4 output bytes.

Any 2-byte input difference → at least 3-byte output difference.

Inverse MixColumns

For decryption, use the inverse matrix with coefficients:

0E, 0B, 0D, 09

These are more expensive to compute — a key factor in hardware design.

The polynomial c(x) = {03}x³ + {01}x² + {01}x + {02} is coprime to x⁴ + 1, guaranteeing invertibility.

AddRoundKey

The simplest transformation: bitwise XOR of the state with the round key.

State'_i,j = State_i,j ⊕ Key_i,j

Properties

✓ XOR is its own inverse — same for encrypt/decrypt

✓ Zero-cost in hardware — just wire connections

✓ Introduces the secret key into every round

✓ Without it, the cipher would be a public permutation

Round Key Derivation

Each 128-bit round key is drawn from the expanded key schedule:

Key_r = W[4r] ‖ W[4r+1] ‖ W[4r+2] ‖ W[4r+3]

where W[] is the expanded key array (44 words for AES-128)

Efficient 32-bit Implementation

On a 32-bit CPU, AddRoundKey is 4 XOR operations (one per column-word).

state[c] ^= roundkey[c]; // c = 0..3

Key Schedule (AES-128)

Expands the 128-bit cipher key into 44 × 32-bit words (11 round keys).

Key Expansion Algorithm

def key_expansion(key):
    W = [0] * 44
    # First 4 words = cipher key
    for i in range(4):
        W[i] = key_word(key, i)

    for i in range(4, 44):
        temp = W[i - 1]
        if i % 4 == 0:
            temp = sub_word(rot_word(temp))
            temp ^= RCON[i // 4]
        W[i] = W[i - 4] ^ temp
    return W

Core Functions

RotWord — Rotate 4 bytes left by 1 position

[a₀ a₁ a₂ a₃] → [a₁ a₂ a₃ a₀]

SubWord — Apply S-box to each of 4 bytes

[a₀ a₁ a₂ a₃] → [S(a₀) S(a₁) S(a₂) S(a₃)]

Rcon[i] — Round constant

= [xⁱ⁻¹, 00, 00, 00] in GF(2⁸)

01, 02, 04, 08, 10, 20, 40, 80, 1B, 36

    Design rationale: Rcon prevents symmetry between round keys. SubWord introduces non-linearity. Simple XOR-chain propagates change across all words.
  

Key Schedule — AES-192 & AES-256

AES-192 (N_k = 6)

Expands 192-bit key into 52 words (13 round keys)

Non-linear function (SubWord + RotWord + Rcon) applied every 6th word:

if i mod 6 == 0: apply g()

Otherwise: simple XOR chain

AES-256 (N_k = 8)

Expands 256-bit key into 60 words (15 round keys)

Extra SubWord applied at position 4 within each 8-word group:

if i mod 8 == 0: apply g()
if i mod 8 == 4: SubWord only

This extra non-linear step was added to resist related-key attacks.

Variant	Key Words (N_k)	Expanded Words	Round Keys	g() applied every
AES-128	4	44	11	4 words
AES-192	6	52	13	6 words
AES-256	8	60	15	8 words (+ SubWord at 4)

Decryption — Inverse Cipher

AES decryption applies the inverse of each operation in reverse order.

Direct Inverse

AddRoundKey(State, Key_{N_r})

for r = N_r−1 downto 1:

InvShiftRows(State)

InvSubBytes(State)

AddRoundKey(State, Key_r)

InvMixColumns(State)

InvShiftRows(State)

InvSubBytes(State)

AddRoundKey(State, Key₀)

Equivalent Inverse Cipher

Key insight: By reordering InvSubBytes↔InvShiftRows and InvMixColumns↔AddRoundKey, the inverse cipher has the same structure as encryption.

This requires applying InvMixColumns to round keys 1…N_r−1 during key expansion.

Inverse Operations

Encrypt	Decrypt	Cost change
SubBytes	InvSubBytes	Same (separate LUT)
ShiftRows	InvShiftRows	Shift right
MixColumns ×{02,03}	InvMixColumns ×{09,0B,0D,0E}	~3× more expensive
AddRoundKey	AddRoundKey	XOR = self-inverse

T-Table Optimisation (Software)

On 32-bit processors, SubBytes + ShiftRows + MixColumns can be fused into four 256-entry 32-bit lookup tables.

T-box Definition

T₀[a] = [ 02·S(a), S(a), S(a), 03·S(a) ]
T₁[a] = [ 03·S(a), 02·S(a), S(a), S(a) ]
T₂[a] = [ S(a), 03·S(a), 02·S(a), S(a) ]
T₃[a] = [ S(a), S(a), 03·S(a), 02·S(a) ]

One Column of Output

e_j = T₀[a_0,j] ⊕ T₁[a_1,j+1] ⊕ T₂[a_2,j+2] ⊕ T₃[a_3,j+3] ⊕ K_j

Performance

Metric	S-box	T-table
Tables	1 × 256 B	4 × 1 KB
Total memory	256 B	4 KB
Lookups/round	16	16
XORs/round	many	16 (32-bit)
Ops per round	~160	~20

⚠ Cache-timing vulnerability

Table access patterns leak key-dependent indices. Bernstein (2005) demonstrated full AES key recovery from timing measurements alone.

AES-NI — Hardware Instructions

Intel/AMD processors include dedicated AES instructions since Westmere (2010).

Instruction Set

Instruction	Operation
AESENC	One encryption round
AESENCLAST	Final encryption round
AESDEC	One decryption round
AESDECLAST	Final decryption round
AESKEYGENASSIST	Key schedule helper
AESIMC	InvMixColumns (for EqInv keys)

Performance Impact

AESENC — single cycle latency, fully pipelined

With AES-NI, AES-128-CTR achieves ~4 cycles/byte on modern x86.

That's ~8 GB/s single-core at 4 GHz.

Security Benefits

✓ Constant-time execution — immune to cache-timing attacks

✓ No S-box/T-table in memory — no table access patterns

✓ S-box computed in hardware via GF(2⁴)² composite field

Intel® Advanced Encryption Standard Instructions (AES-NI) White Paper (2010) · ARM Cryptographic Extension (ARMv8-A, 2011)

Hardware Architectures

AES hardware spans a wide design space from compact IoT to high-throughput network processors.

Basic Iterative

Single round, reused N_r times

Area: ~3,000–5,000 GE

Throughput: ~1–3 Gbps

Latency: 10–14 cycles

Ideal for area-constrained IoT & smart cards

Inner-Round Pipelined

Pipeline registers within a single round

Area: ~8,000–15,000 GE

Throughput: ~5–15 Gbps

Latency: 20–40 cycles

Balances area and speed

Fully Unrolled + Pipelined

All 10/14 rounds instantiated in hardware

Area: ~50,000–170,000 GE

Throughput: ~30–53 Gbps

Latency: 10–14 cycles

Maximum throughput for network & SoC

Architecture	Platform	Area	Throughput	Efficiency (Gbps/kGE)
8-bit serial	180nm ASIC	~3.1 kGE	~80 Mbps	0.026
Iterative 128-bit	Virtex-5 FPGA	1,364 slices	3.2 Gbps	—
Sub-pipelined	Virtex-6 FPGA	2,100 slices	12.8 Gbps	—
Composite-field (Mathew et al.)	45nm ASIC	56 kGE	53 Gbps	0.946

Mathew et al., "53 Gbps AES" IEEE JSSC 2011 · Chodowiec & Gaj, "Very Compact FPGA AES" (2003)

S-box Hardware — Composite Fields

The S-box is the most expensive block in AES hardware. Two main approaches exist.

LUT-Based (ROM/BRAM)

Store the 256×8-bit table directly:

✓ Simple — direct memory lookup

✓ FPGA: fits in one 18Kb BRAM

✗ 16 S-boxes needed → 16 BRAMs per round

✗ Larger area on ASIC

✗ Susceptible to power side-channels

Composite-Field (GF((2⁴)²))

Decompose GF(2⁸) into tower field GF((2⁴)²):

Inversion in GF(2⁸) → operations in GF(2⁴)

✓ ~20% fewer gates (Canright, 2005)

✓ Used in Intel AES-NI hardware

✓ Better for ASIC — pure logic

✗ More complex design verification

    Canright's S-box: Only 92 GE per S-box using GF(2²) subfield decomposition — the most compact known.
  

Tower Field Decomposition

GF(2⁸) → isomorphic to GF((2⁴)²) → further to GF(((2²)²)²)
Each inversion reduces to multiplications and squarings in progressively smaller subfields, eventually reaching GF(2²) where inversion is just XOR + AND gates.

Canright, "A Very Compact Rijndael S-box" (2005) · Mathew et al., IEEE JSSC (2011)

Modes of Operation

AES is a block cipher (128-bit). To encrypt arbitrary-length data, use a mode of operation.

ECB

Each block encrypted independently

✗ Deterministic — reveals patterns

✗ Never use for real data

CBC

Each block XORed with previous ciphertext

✓ Widely deployed (TLS ≤1.2)

✗ Sequential — can't parallelize encryption

✗ Padding oracle attacks

CTR

Encrypt counter → XOR with plaintext

✓ Fully parallelizable

✓ Encryption-only hardware needed

✗ No integrity protection

GCM (Galois/Counter Mode) ★

CTR encryption + GHASH authentication

✓ Authenticated encryption (AEAD)

✓ Fully parallelizable

✓ Hardware-friendly — GHASH uses GF(2¹²⁸)

The standard choice for TLS 1.3, IPsec, SSH

Other Notable Modes

XTS — Tweakable, used for disk encryption

CCM — CTR + CBC-MAC, used in Wi-Fi (WPA2/3)

SIV — Nonce-misuse resistant AEAD

GCM-SIV — Nonce-misuse resistant GCM variant

Side-Channel Attacks on AES

Attack Vectors

TIMING

T-table cache access patterns reveal key-dependent indices. Bernstein's attack (2005) recovers full AES-256 key from network timing.

POWER / EM

Differential Power Analysis (DPA) — correlate power traces with hypothetical S-box outputs across many encryptions.

Target: first or last round SubBytes, where known plaintext/ciphertext meets key.

FAULT

Differential Fault Analysis (DFA) — inject faults in round 8/9, compare correct/faulty ciphertexts → recover key.

A single byte fault in round 9 can reduce key search to ~2³².

Key Targets

The S-box is the primary target because:

1. Only non-linear operation → strongest correlation with power

2. Operates byte-by-byte → divide-and-conquer (16 × 2⁸ vs 2¹²⁸)

3. T-table indices directly reveal key bytes

Combined attacks exploit both fault injection and power analysis simultaneously, defeating higher-order masking countermeasures.

Roche et al. (2011), Dassance & Venelli (2012)

Cache attacks — Flush+Reload, Prime+Probe on shared L3 cache can extract AES keys across VMs.

Side-Channel Countermeasures

Software Countermeasures

MASKING

Split every sensitive variable into d+1 random shares such that x = x₁ ⊕ x₂ ⊕ … ⊕ x_d+1

d^th-order masking requires d+1 shares; attacker must combine d+1 trace points.

SHUFFLING

Randomize the order of S-box computations within a round (16 S-box calls can be permuted in 16! ways).

BITSLICING

Compute S-box as Boolean circuit — constant time, no table lookups. 32 AES blocks in parallel on 32-bit CPU.

Hardware Countermeasures

DUAL-RAIL LOGIC

Each bit represented as complementary pair (a, ā). Constant Hamming weight per operation.

THRESHOLD IMPLEMENTATIONS

Hardware secret sharing with provable glitch resistance. Each share processed by independent logic.

REDUNDANCY (DFA defense)

Temporal: encrypt twice, compare
Spatial: TMR (triple modular redundancy)
Inverse check: decrypt ciphertext, compare with plaintext

NOISE INJECTION

Random delays, dummy operations, clock jitter to decorrelate power traces.

Python Implementation (AES-128)

SBOX = [
    0x63,0x7c,0x77,0x7b,0xf2,0x6b,0x6f,0xc5,0x30,0x01,0x67,0x2b,0xfe,0xd7,0xab,0x76,
    0xca,0x82,0xc9,0x7d,0xfa,0x59,0x47,0xf0,0xad,0xd4,0xa2,0xaf,0x9c,0xa4,0x72,0xc0,
    # ... full 256-entry table ...
]
RCON = [0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80,0x1b,0x36]

def xtime(a):            return ((a << 1) ^ 0x1B) & 0xFF if a & 0x80 else (a << 1) & 0xFF
def gf_mul(a, b):        # multiply in GF(2^8) via repeated xtime
    p = 0
    for _ in range(8):
        if b & 1: p ^= a
        a = xtime(a); b >>= 1
    return p

def sub_bytes(state):    return [SBOX[b] for b in state]
def shift_rows(s):       return [s[0],s[5],s[10],s[15], s[4],s[9],s[14],s[3],
                                 s[8],s[13],s[2],s[7],   s[12],s[1],s[6],s[11]]
def mix_columns(s):
    r = []
    for c in range(4):
        i = c * 4
        r += [gf_mul(2,s[i])^gf_mul(3,s[i+1])^s[i+2]^s[i+3],
              s[i]^gf_mul(2,s[i+1])^gf_mul(3,s[i+2])^s[i+3],
              s[i]^s[i+1]^gf_mul(2,s[i+2])^gf_mul(3,s[i+3]),
              gf_mul(3,s[i])^s[i+1]^s[i+2]^gf_mul(2,s[i+3])]
    return r

def add_round_key(s, k):  return [a ^ b for a, b in zip(s, k)]

def aes_encrypt(plaintext, key):  # both 16-byte lists
    state = add_round_key(plaintext, key[:16])
    rkeys = key_expansion(key)     # returns list of 11 × 16-byte round keys
    for r in range(1, 10):
        state = sub_bytes(state)
        state = shift_rows(state)
        state = mix_columns(state)
        state = add_round_key(state, rkeys[r])
    state = sub_bytes(state)
    state = shift_rows(state)
    state = add_round_key(state, rkeys[10])
    return state

SystemVerilog — AES Round

module aes_round (
    input  logic [127:0] state_in,
    input  logic [127:0] round_key,
    input  logic         last_round,  // skip MixColumns if high
    output logic [127:0] state_out
);

    logic [127:0] after_sub, after_shift, after_mix;

    // ---- SubBytes: 16 parallel S-box instances ----
    genvar i;
    generate
        for (i = 0; i < 16; i++) begin : g_sbox
            aes_sbox u_sbox (
                .in  (state_in[8*i +: 8]),
                .out (after_sub[8*i +: 8])
            );
        end
    endgenerate

    // ---- ShiftRows: wire-level byte permutation ----
    assign after_shift = {
        after_sub[127:120], after_sub[ 87: 80], after_sub[ 47: 40], after_sub[  7:  0],
        after_sub[ 95: 88], after_sub[ 55: 48], after_sub[ 15:  8], after_sub[103: 96],
        after_sub[ 63: 56], after_sub[ 23: 16], after_sub[111:104], after_sub[ 71: 64],
        after_sub[ 31: 24], after_sub[119:112], after_sub[ 79: 72], after_sub[ 39: 32]
    };

    // ---- MixColumns: 4 column mixers ----
    generate
        for (i = 0; i < 4; i++) begin : g_mix
            aes_mix_column u_mix (
                .col_in  (after_shift[32*i +: 32]),
                .col_out (after_mix[32*i +: 32])
            );
        end
    endgenerate

    // ---- AddRoundKey ----
    assign state_out = (last_round ? after_shift : after_mix) ^ round_key;

endmodule

Performance Landscape

Platform	Implementation	Throughput	Latency	Notes
8-bit AVR	Byte-serial, table S-box	~1.5 Mbps	~2,700 cycles	IoT / smart cards
ARM Cortex-M4	Bitsliced	~45 Mbps	~480 cycles	Embedded
x86 (no AES-NI)	T-table, OpenSSL	~3.2 Gbps	~160 cycles	Software, 4 GHz
x86 (AES-NI)	AESENC pipeline	~8 Gbps	~40 cycles	Single-core CTR
Virtex-5 FPGA	Iterative 128-bit	~3.2 Gbps	11 cycles	1,364 slices
Virtex-7 FPGA	Full pipeline	~24 Gbps	11 cycles	~4,200 slices
45nm ASIC	Composite S-box, full pipeline	~53 Gbps	10 cycles	56 kGE, Mathew et al.
CUDA GPU	Parallel ECB/CTR	~135 Gbps	—	RTX 4090, batch mode

    From smart card (1.5 Mbps) to GPU (135 Gbps) — a 90,000× throughput range. AES was designed for this versatility.
  

Cryptanalysis Status (2025)

Best Known Attacks

Biclique attack (Bogdanov et al., 2011)

AES-128: 2^126.1 AES-192: 2^189.7 AES-256: 2^254.4

Negligible improvement over brute force — 3–4 bit gain.

Related-key attacks (Biryukov & Khovratovich, 2009)

AES-256: 2^99.5 (chosen related keys)

Impractical — requires attacker to choose related keys. Addressed in key schedule design.

Square / integral attacks

Effective up to 6–7 rounds. Full AES (10+ rounds) has comfortable security margin.

Security Assessment

AES remains unbroken

No practical attack reducing security below the intended level exists as of 2025.

Post-Quantum Consideration

Grover's algorithm reduces brute-force to 2^n/2:

AES-128 → ~2⁶⁴ quantum ops

AES-256 → ~2¹²⁸ quantum ops

Recommendation: use AES-256 for quantum-resistant symmetric encryption.

CNSA 2.0 (NSA) mandates AES-256.

Where AES Is Deployed

Network Security

TLS 1.3 (AES-128/256-GCM)

IPsec / IKEv2

SSH

WPA3 (AES-CCM, AES-GCMP)

Signal Protocol (AES-256-CBC)

Storage Encryption

BitLocker (AES-XTS)

FileVault 2 (AES-XTS-128)

LUKS / dm-crypt

Android FBE

Self-Encrypting Drives (TCG Opal)

Hardware / SoC

Intel AES-NI (Westmere+)

ARM Crypto Extensions (v8-A+)

RISC-V Zkn extension

Qualcomm Inline Crypto Engine

Apple Secure Enclave

    AES encrypts virtually all internet traffic, disk storage, and mobile communications worldwide. It is the most widely deployed symmetric cipher in history.
  

Design Philosophy

Daemen & Rijmen's three core principles:

🔬

Resistance to Attacks

S-box from GF(2⁸) inverse — optimal non-linearity. MDS matrix for max diffusion. Wide trail strategy provides provable bounds against differential and linear cryptanalysis.

⚡

Performance & Efficiency

T-table fusion for 32-bit CPUs. Byte-oriented for 8-bit µC. Wire-only ShiftRows. Simple xtime() for MixColumns. Parallelizable key schedule.

📐

Simplicity & Transparency

No secret design constants — everything derived from mathematical first principles. Simple algebraic structure enables thorough analysis and prevents suspicion of trapdoors.

Daemen & Rijmen, "The Design of Rijndael" (Springer, 2002) ch. 5–9

References

Standards

FIPS 197 — Advanced Encryption Standard (AES), NIST, 2001

NIST SP 800-38A — Modes of Operation

NIST SP 800-38D — GCM Recommendation

CNSA 2.0 — NSA Quantum-Resistant Algorithm Suite

Design & Theory

Daemen & Rijmen, "The Design of Rijndael" (Springer, 2002)

Daemen & Rijmen, AES Proposal: Rijndael v2 (1999)

Shannon, "Communication Theory of Secrecy Systems" (1949)

Hardware

Canright, "A Very Compact Rijndael S-box" (2005)

Mathew et al., "53 Gbps AES in 45nm" IEEE JSSC (2011)

Gaj & Chodowiec, "FPGA and ASIC Implementations of AES" in Cryptographic Engineering (2009)

Cryptanalysis & Side-Channels

Bogdanov et al., "Biclique Cryptanalysis of AES" (ASIACRYPT 2011)

Biryukov & Khovratovich, "Related-Key Attacks on AES-256" (CRYPTO 2009)

Bernstein, "Cache-Timing Attacks on AES" (2005)

Kocher et al., "Differential Power Analysis" (CRYPTO 1999)

Part of the Modern Cryptography Presentation Series

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.a	.b	.c	.d	.e	.f
0.	63	7c	77	7b	f2	6b	6f	c5	30	01	67	2b	fe	d7	ab	76
1.	ca	82	c9	7d	fa	59	47	f0	ad	d4	a2	af	9c	a4	72	c0
2.	b7	fd	93	26	36	3f	f7	cc	34	a5	e5	f1	71	d8	31	15
3.	04	c7	23	c3	18	96	05	9a	07	12	80	e2	eb	27	b2	75
4.	09	83	2c	1a	1b	6e	5a	a0	52	3b	d6	b3	29	e3	2f	84
5.	53	d1	00	ed	20	fc	b1	5b	6a	cb	be	39	4a	4c	58	cf
6.	d0	ef	aa	fb	43	4d	33	85	45	f9	02	7f	50	3c	9f	a8
7.	51	a3	40	8f	92	9d	38	f5	bc	b6	da	21	10	ff	f3	d2
8.	cd	0c	13	ec	5f	97	44	17	c4	a7	7e	3d	64	5d	19	73
9.	60	81	4f	dc	22	2a	90	88	46	ee	b8	14	de	5e	0b	db
a.	e0	32	3a	0a	49	06	24	5c	c2	d3	ac	62	91	95	e4	79
b.	e7	c8	37	6d	8d	d5	4e	a9	6c	56	f4	ea	65	7a	ae	08
c.	ba	78	25	2e	1c	a6	b4	c6	e8	dd	74	1f	4b	bd	8b	8a
d.	70	3e	b5	66	48	03	f6	0e	61	35	57	b9	86	c1	1d	9e
e.	e1	f8	98	11	69	d9	8e	94	9b	1e	87	e9	ce	55	28	df
f.	8c	a1	89	0d	bf	e6	42	68	41	99	2d	0f	b0	54	bb	16

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.a	.b	.c	.d	.e	.f
0.	63	7c	77	7b	f2	6b	6f	c5	30	01	67	2b	fe	d7	ab	76
1.	ca	82	c9	7d	fa	59	47	f0	ad	d4	a2	af	9c	a4	72	c0
2.	b7	fd	93	26	36	3f	f7	cc	34	a5	e5	f1	71	d8	31	15
3.	04	c7	23	c3	18	96	05	9a	07	12	80	e2	eb	27	b2	75
4.	09	83	2c	1a	1b	6e	5a	a0	52	3b	d6	b3	29	e3	2f	84
5.	53	d1	00	ed	20	fc	b1	5b	6a	cb	be	39	4a	4c	58	cf
6.	d0	ef	aa	fb	43	4d	33	85	45	f9	02	7f	50	3c	9f	a8
7.	51	a3	40	8f	92	9d	38	f5	bc	b6	da	21	10	ff	f3	d2
8.	cd	0c	13	ec	5f	97	44	17	c4	a7	7e	3d	64	5d	19	73
9.	60	81	4f	dc	22	2a	90	88	46	ee	b8	14	de	5e	0b	db
a.	e0	32	3a	0a	49	06	24	5c	c2	d3	ac	62	91	95	e4	79
b.	e7	c8	37	6d	8d	d5	4e	a9	6c	56	f4	ea	65	7a	ae	08
c.	ba	78	25	2e	1c	a6	b4	c6	e8	dd	74	1f	4b	bd	8b	8a
d.	70	3e	b5	66	48	03	f6	0e	61	35	57	b9	86	c1	1d	9e
e.	e1	f8	98	11	69	d9	8e	94	9b	1e	87	e9	ce	55	28	df
f.	8c	a1	89	0d	bf	e6	42	68	41	99	2d	0f	b0	54	bb	16

AES

Historical Context

The Road to AES

Why Rijndael Won

AES at a Glance

Substitution–Permutation Network (SPN)

Three Design Layers

The State Matrix

128-bit Input Block

Column-Major Layout

Round Structure

GF(28) Arithmetic Recap

Addition

Multiplication

xtime() — Multiply by x

Multiplicative Inverse

SubBytes — S-box Construction

Two-Step Construction

S-box Properties

Example

The AES S-box (complete lookup table)

ShiftRows

MixColumns

Matrix Multiplication in GF(28)

MDS Property

Inverse MixColumns

AddRoundKey

Properties

Round Key Derivation

Efficient 32-bit Implementation

Key Schedule (AES-128)

Key Expansion Algorithm

Core Functions

Key Schedule — AES-192 & AES-256

AES-192 (Nk = 6)

AES-256 (Nk = 8)

Decryption — Inverse Cipher

Direct Inverse

Equivalent Inverse Cipher

Inverse Operations

T-Table Optimisation (Software)

T-box Definition

One Column of Output

Performance

AES-NI — Hardware Instructions

Instruction Set

Performance Impact

Security Benefits

Hardware Architectures

Basic Iterative

Inner-Round Pipelined

Fully Unrolled + Pipelined

S-box Hardware — Composite Fields

LUT-Based (ROM/BRAM)

Composite-Field (GF((2⁴)²))

Tower Field Decomposition

Modes of Operation

ECB

CBC

CTR

GCM (Galois/Counter Mode) ★

Other Notable Modes

Side-Channel Attacks on AES

Attack Vectors

Key Targets

Side-Channel Countermeasures

Software Countermeasures

Hardware Countermeasures

Python Implementation (AES-128)

SystemVerilog — AES Round

Performance Landscape

Cryptanalysis Status (2025)

Best Known Attacks

Security Assessment

Post-Quantum Consideration

Where AES Is Deployed

Network Security

Storage Encryption

Hardware / SoC

Design Philosophy

GF(2⁸) Arithmetic Recap

Matrix Multiplication in GF(2⁸)

AES-192 (N_k = 6)

AES-256 (N_k = 8)

	.0	.1	.2	.3	.4	.5	.6	.7	.8	.9	.a	.b	.c	.d	.e	.f
0.	63	7c	77	7b	f2	6b	6f	c5	30	01	67	2b	fe	d7	ab	76
1.	ca	82	c9	7d	fa	59	47	f0	ad	d4	a2	af	9c	a4	72	c0
2.	b7	fd	93	26	36	3f	f7	cc	34	a5	e5	f1	71	d8	31	15
3.	04	c7	23	c3	18	96	05	9a	07	12	80	e2	eb	27	b2	75
4.	09	83	2c	1a	1b	6e	5a	a0	52	3b	d6	b3	29	e3	2f	84
5.	53	d1	00	ed	20	fc	b1	5b	6a	cb	be	39	4a	4c	58	cf
6.	d0	ef	aa	fb	43	4d	33	85	45	f9	02	7f	50	3c	9f	a8
7.	51	a3	40	8f	92	9d	38	f5	bc	b6	da	21	10	ff	f3	d2
8.	cd	0c	13	ec	5f	97	44	17	c4	a7	7e	3d	64	5d	19	73
9.	60	81	4f	dc	22	2a	90	88	46	ee	b8	14	de	5e	0b	db
a.	e0	32	3a	0a	49	06	24	5c	c2	d3	ac	62	91	95	e4	79
b.	e7	c8	37	6d	8d	d5	4e	a9	6c	56	f4	ea	65	7a	ae	08
c.	ba	78	25	2e	1c	a6	b4	c6	e8	dd	74	1f	4b	bd	8b	8a
d.	70	3e	b5	66	48	03	f6	0e	61	35	57	b9	86	c1	1d	9e
e.	e1	f8	98	11	69	d9	8e	94	9b	1e	87	e9	ce	55	28	df
f.	8c	a1	89	0d	bf	e6	42	68	41	99	2d	0f	b0	54	bb	16