🔐

Finite Field Arithmetic
for Cryptography

Theory, Standards, and Hardware/Software Implementation
Modern Cryptography Series · Presentation 1 of 10

Navigate: arrow keys or swipe · Press ESC for overview

Agenda

01 What is a Finite Field?

Algebraic foundations — groups, rings, fields

02 GF(p) — Prime Fields

Modular arithmetic, properties, and crypto primes

03 GF(2n) — Binary Extension Fields

Polynomial basis, irreducible polynomials, XOR arithmetic

04 Fields in Cryptographic Standards

AES, ECC curves, and why each field was chosen

05 Hardware Implementation

SoC, FPGA, software — Montgomery multiplication

06 SystemVerilog Examples

GF(28) multiplier, modular arithmetic RTL

07 Python Examples

Galois field classes, ECC field arithmetic

08 Performance & Trade-offs

LUT vs. combinational vs. pipelined comparison

What is a Finite Field?

Group
(G, +)
Closure, associativity, identity, inverses
Ring
(R, +, ×)
Group under +; closure & associativity under ×; distributive law
Field
(F, +, ×)
Ring where every non-zero element has a multiplicative inverse
Finite Field
GF(pn)
A field with a finite number of elements. Exists iff |F| = pn (p prime)

Key Properties

Exists uniquely for every prime power pn
Characteristic p: adding 1 to itself p times gives 0
GF(p): prime field — integers mod p
GF(pn): extension field — polynomials mod irreducible poly
All arithmetic stays within the field (closure)
Every non-zero element has a multiplicative inverse
Named after Évariste Galois (1811–1832)
Foundation of AES, ECC, error-correcting codes, and more

GF(p) — Prime Fields

GF(p) = { 0, 1, 2, …, p−1 }  where p is prime
All operations performed modulo p. Addition, subtraction, multiplication, and division (via modular inverse) produce results within the field.

Example: GF(7) Addition Table

+0123456
00123456
11234560
22345601
33456012
44560123
55601234
66012345

Notice the wrap-around — results never exceed 6

Prime Fields in Cryptography

P-256 128-bit security
p = 2256 − 2224 + 2192 + 296 − 1
TLS, ECDSA, ECDH, FIDO2
Curve25519 ~128-bit security
p = 2255 − 19
Signal, WireGuard, SSH, TLS 1.3
P-384 192-bit security
p = 2384 − 2128 − 296 + 232 − 1
Suite B, high-security TLS
P-521 ~260-bit security
p = 2521 − 1 (Mersenne prime)
Curve448 224-bit security
p = 2448 − 2224 − 1 (Goldilocks prime)

GF(2n) — Binary Extension Fields

Elements are polynomials over GF(2) with binary coefficients {0, 1}
Arithmetic mod an irreducible polynomial of degree n.  Addition = XOR,  Multiplication = polynomial multiply + reduce.

🔒 GF(28) — The AES Field

Irreducible polynomial (AES standard):

m(x) = x8 + x4 + x3 + x + 1  (0x11B)

256 elements — each is a byte (8 bits)

Add: 0x57 0x83 = 0xD4
01010111 XOR 10000011 = 11010100
Mul: 0x57 × 0x83 = 0xC1 (mod m(x))
Polynomial multiply, then reduce mod x8+x4+x3+x+1

Why Binary Extension Fields?

Addition = XOR → no carry propagation, single gate per bit
Maps naturally to digital hardware (bits = coefficients)
Constant-time operations → side-channel resistant
Efficient in HW (LUT, shift-XOR) and SW (bitwise ops)
Used in AES, GCM, CRC, Reed-Solomon, McEliece

The xtime operation (multiply by x) is a left-shift with conditional XOR — the fundamental building block of GF(2n) hardware.

Finite Fields in Cryptographic Standards

StandardFieldWhy This Field?Used For
AES (Rijndael)GF(28)
m(x)=x8+x4+x3+x+1
Byte-oriented; XOR = no carry chain; efficient S-box via multiplicative inverseSymmetric encryption (TLS, disk, VPN)
AES-GCMGF(2128)
m(x)=x128+x7+x2+x+1
GHASH authentication tag; 128-bit block matches AES; carry-less multiply (PCLMULQDQ)Authenticated encryption (HTTPS, IPsec)
NIST P-256GF(p), p=2256−2224+2192+296−1Pseudo-Mersenne prime → fast reduction via shift/add; 128-bit securityTLS certificates, code signing, FIDO2
Curve25519GF(p), p=2255−19Near-Mersenne prime → single-word reduction; constant-time ladder; cofactor 8Signal, WireGuard, SSH, TLS 1.3
Curve448GF(p), p=2448−2224−1Goldilocks prime: efficient Karatsuba; 224-bit securityHigh-security TLS, NIST SP 800-186
SM2GF(p), 256-bitNational standard prime; similar structure to P-256Chinese gov encryption/signatures
💡
Pseudo-Mersenne and Mersenne primes are preferred because reduction mod p becomes a series of additions and shifts — no general-purpose division needed.

Why Were These Fields Chosen?

🔒 AES: Why GF(28)?

Byte-oriented design
Each byte = one element of GF(28). Maps perfectly to 8-bit processors and memory widths.

S-Box = Multiplicative Inverse
Computes multiplicative inverse in GF(28) followed by affine transform. Provides optimal non-linearity against differential/linear cryptanalysis.

MixColumns = Matrix over GF(28)
Column polynomial multiplied by fixed polynomial c(x) mod x4+1. Provides diffusion.

No carry propagation
Addition is XOR — constant-time, no carries, no branching.

🔑 ECC: Why Pseudo-Mersenne Primes?

Fast modular reduction
For p = 2255−19, reducing mod p needs only multiply by 19 and add — no division. For P-256, reduction uses shifts and additions.

Constant-time implementation
Curve25519 uses a Montgomery ladder: every scalar bit executes identical operations. The prime allows this with efficient limb arithmetic.

Security margin
P-256 → 128-bit; P-384 → 192-bit; P-521 → ~260-bit. Curve25519 achieves ~128-bit with superior performance.

Rigidity / Verifiability
Curve25519: smallest p ≡ 5 (mod 8) near 2256. Transparent, auditable parameter choices.

Hardware Implementation of FF Arithmetic

⚙ SoC / ASIC

  • Dedicated crypto accelerator blocks (AES-NI in x86)
  • Composite field: GF(28) → GF(((22)2)2) reduces gates ~20%
  • Montgomery multipliers with systolic arrays for ECC
  • Side-channel: masking, dual-rail logic
  • Intel 45nm: 53 Gbps native GF((24)2) AES

🔧 FPGA

  • LUT-based S-box: ~32 LUTs per byte on Xilinx 7-series
  • Composite field S-box: Canright's method for compact design
  • Pipelined: 7+ Gbps on Virtex-E
  • Block RAM for precomputed tables (256×8 AES S-box)
  • DSP slices for large-field Montgomery multiply

💻 Software

  • Table-driven: 256-entry lookup for GF(28) (cache-timing risk)
  • Bitsliced: 64 blocks in parallel via 64-bit registers
  • SIMD: AVX2/NEON for parallel field ops
  • PCLMULQDQ for GF(2128) in AES-GCM
  • Limb arithmetic: 5×51-bit for Curve25519

Montgomery Multiplication for Hardware

Core Idea

Replace division by N with division by R (a power of 2)

MonPro(ã, b̃, N):
  R  = 2ⁿ    (n = bit-width of N)
  N' = -N⁻¹ mod R

  t = ã × b̃
  m = t × N' mod R      ← just mask lower n bits
  u = (t + m × N) / R   ← just right-shift n bits
  if u ≥ N: u = u − N
  return u               (= a·b·R mod N)
mod R = mask lower n bits
÷ R = right-shift by n bits
No expensive modular division!

Hardware Architectures

Bit-Serial

1 bit/cycle. Minimal area, n+1 cycles. IoT SoCs.

Word-Serial (CIOS)

w-bit words, w×w multiplier + CSA. Best FPGA ECC trade-off.

Systolic Array

Pipelined PEs, regular data flow. High throughput. Ideal for ASIC.

Full-Parallel

1-cycle multiply. Enormous area, maximum throughput. AES-NI.

SystemVerilog: GF(28) Multiplier

module gf2_8_mult (
  input  logic [7:0] a, b,
  output logic [7:0] p
);
  // AES irreducible polynomial: x^8 + x^4 + x^3 + x + 1
  localparam [8:0] POLY = 9'h11B;
  logic [7:0] partial [8];

  // Bit-serial "Russian peasant" multiply
  always_comb begin
    partial[0] = a;
    for (int i = 0; i < 8; i++) begin
      if (i == 0)
        p = b[0] ? a : 8'h00;
      else
        p = p ^ (b[i] ? partial[i] : 8'h00);

      // xtime: shift left, reduce if MSB was set
      if (i < 7)
        partial[i+1] = {partial[i][6:0], 1'b0}
                      ^ (partial[i][7] ? POLY[7:0] : 8'h00);
    end
  end
endmodule

How It Works

Russian Peasant Algorithm
Shift-and-add adapted for GF(2n). Process one bit per iteration.

xtime Operation
Left-shift by 1. If MSB was 1, XOR with reduction polynomial.

Irreducible Poly
0x11B ensures 256 unique elements in the field.

Synthesis
Fully combinational. ~100–150 gates. Can be pipelined.

Usage
AES MixColumns (×02, ×03) and S-box (inverse via Itoh-Tsujii).

SystemVerilog: Modular Arithmetic for GF(p)

module mod_add #(parameter WIDTH = 256)(
  input  logic [WIDTH-1:0] a, b, p,
  output logic [WIDTH-1:0] sum
);
  logic [WIDTH:0] temp;  // 1 extra bit for overflow
  always_comb begin
    temp = a + b;
    sum  = (temp >= {1'b0, p}) ? temp - p
                               : temp[WIDTH-1:0];
  end
endmodule

module mod_sub #(parameter WIDTH = 256)(
  input  logic [WIDTH-1:0] a, b, p,
  output logic [WIDTH-1:0] diff
);
  always_comb begin
    if (a >= b)
      diff = a - b;
    else
      diff = p - (b - a);  // wrap around via p
  end
endmodule

Design Notes

Parameterized
Same module for 255-bit (Curve25519), 256-bit (P-256), 384-bit (P-384).

Conditional Reduction
a+b < 2p, so one subtraction suffices.

Carry Bit
Extra bit catches overflow when a+b > 2WIDTH.

⚠ Side-Channel Warning
Conditional subtract creates data-dependent timing. Production crypto must use constant-time MUX selection.

Python: GF(28) for AES

class GF2_8:
    """Galois Field GF(2^8) with AES polynomial."""
    POLY = 0x11B  # x^8 + x^4 + x^3 + x + 1

    def __init__(self, val):
        self.val = val & 0xFF

    def __add__(self, other):        # Addition = XOR
        return GF2_8(self.val ^ other.val)

    def __mul__(self, other):        # Russian peasant multiply
        a, b, p = self.val, other.val, 0
        for _ in range(8):
            if b & 1:  p ^= a
            hi = a & 0x80
            a = (a << 1) & 0xFF
            if hi:     a ^= self.POLY & 0xFF
            b >>= 1
        return GF2_8(p)

    def inverse(self):               # Fermat: a^(2^8 - 2) = a^254
        r = GF2_8(1)
        base = GF2_8(self.val)
        for bit in bin(254)[2:]:
            r = r * r
            if bit == '1':  r = r * base
        return r

# ── Usage ──
a, b = GF2_8(0x57), GF2_8(0x83)
print(f"0x57 * 0x83 = 0x{(a * b).val:02X}")       # 0xC1
print(f"inv(0x57)   = 0x{a.inverse().val:02X}")     # 0xFE
print(f"0x57 * inv  = 0x{(a * a.inverse()).val:02X}") # 0x01 ✓

Key Points

XOR Addition
No carry chain — just bitwise XOR. Subtraction is identical.

Shift-and-XOR Multiply
Mirrors the hardware: shift left, conditionally XOR with polynomial.

Fermat Inverse
a−1 = apn−2 by Fermat's little theorem. For GF(28): a254.

No Dependencies
Pure Python — no external libraries needed.

Python: GF(p) for Elliptic Curves

class GFp:
    """Prime field GF(p) arithmetic."""
    def __init__(self, val, p):
        self.val = val % p
        self.p = p

    def __add__(self, o):  return GFp(self.val + o.val, self.p)
    def __sub__(self, o):  return GFp(self.val - o.val, self.p)
    def __mul__(self, o):  return GFp(self.val * o.val, self.p)
    def __truediv__(self, o): return self * o.inverse()
    def __eq__(self, o):   return self.val == o.val
    def __repr__(self):    return f"GFp({self.val})"

    def inverse(self):     # Fermat's little theorem
        return GFp(pow(self.val, self.p - 2, self.p), self.p)

# ── Curve25519 prime field ──
p25519 = 2**255 - 19
a = GFp(42, p25519)
b = GFp(17, p25519)
assert a * b / a == b          # field division works ✓
assert a * a.inverse() == GFp(1, p25519)  # inverse ✓

# ── P-256 prime field ──
p256 = 2**256 - 2**224 + 2**192 + 2**96 - 1
x = GFp(123456789, p256)
assert x * x.inverse() == GFp(1, p256)    # ✓

Key Points

Modular Arithmetic
Python's % operator handles the reduction. pow(a, p-2, p) uses fast modular exponentiation.

Division = Multiply by Inverse
a / b = a × b−1, where b−1 = bp−2 mod p.

Big Integer Support
Python natively handles 256-bit and 521-bit integers with arbitrary precision.

⚠ Not constant-time
Python's bigint operations are variable-time. Use libraries like pynacl or cryptography for production.

Performance & Trade-offs

AES-128 Throughput by Implementation

SW (OpenSSL)
3.2
FPGA LUT
5.5
FPGA Composite
7.0
FPGA Pipelined
21.5
ASIC (45nm)
53.0

Gbps · Sources: Canright (2005), McCanny (2003), Mathew/Intel (2011)

Implementation Trade-offs

ApproachAreaSpeedSide-Ch.
LUT / TableHigh (RAM)ModerateRisk
Composite FieldLow (~80 gates)ModerateGood
PipelinedMediumVery HighGood
Bit-sliced (SW)N/AHighExcellent
Full-parallelVery HighMaximumExcellent
⚖️
The fundamental trade-off: area vs. throughput vs. side-channel resistance. Composite-field decomposition wins on area; pipelining maximizes throughput; bit-slicing and constant-time algorithms provide side-channel resistance. Production systems must balance all three.

Key Takeaways

1  Finite fields are the algebraic foundation of nearly all modern cryptography — from AES to ECC to post-quantum schemes.

2  Two families dominate: GF(2n) for symmetric ciphers and GF(p) for public-key / elliptic curve cryptography.

3  Field parameters are chosen for both security and efficiency — Mersenne primes, carry-free XOR, structured reduction.

4  Hardware ranges from minimal bit-serial designs (IoT) to 53+ Gbps ASIC accelerators (data centers).

5  Side-channel resistance is as important as raw performance — constant-time code is essential.

References: FIPS 197 (AES) · NIST SP 800-186 (ECC) · RFC 7748 (Curve25519/448) · Canright, "A Very Compact Rijndael S-box"

Modern Cryptography Series

01
Finite Field Arithmetic
This presentation
02
Elliptic Curve Cryptography
Coming next
03
AES — Design & Implementation
Planned
04
Hash Functions & MACs
Planned
05
Public Key Cryptography
Planned
06
Digital Signatures (ECDSA, EdDSA)
Planned
07
Post-Quantum Cryptography
Planned
08
Key Exchange Protocols
Planned
09
Side-Channel Attacks
Planned
10
Crypto HW Accelerator Design
Planned

github.com/modern-cryptography  |  Contributions welcome!