PRESENTATION 09

Side-Channel Attacks
& Countermeasures

Timing · Power Analysis · EM Emanations · Cache Attacks · Fault Injection · Masking · Constant-Time Code


Modern Cryptography Series · Navigate with arrow keys or swipe

Why Side Channels Matter

Cryptographic algorithms are mathematically secure — AES-256 resists 2256 brute-force. But every implementation runs on physical hardware that leaks information through observable side effects.


The Core Insight

Attackers don't break the math. They break the physics.

A side channel is any observable physical quantity — time, power, EM radiation, sound, heat — that correlates with secret data processed inside a device.

🔐

Algorithm Security
AES-128: 2128 operations
≈ 1024 years @ 10 GHz

Side-Channel Attack
DPA on AES-128: 16 × 256 guesses
seconds with an oscilloscope

Taxonomy of Side-Channel Attacks

ChannelObservableAccessKey TechniquesCost
TimingExecution timeRemote / LocalCache timing, branch analysisLow
PowerCurrent drawPhysical probeSPA, DPA, CPAMedium
ElectromagneticEM radiationNear-field probeSEMA, DEMA, cross-device DLMedium
AcousticSound / vibrationMicrophoneCoil whine, keyboard acousticsLow
Cache / μArchCache stateCo-located SWFlush+Reload, Prime+Probe, SpectreLow
Fault InjectionInduced errorsPhysical accessVoltage/clock glitch, laser, RowHammerMedium–High

Classification: Passive (observe only) vs. Active (fault injection). Invasive (decap chip) vs. Non-invasive (external probes) vs. Semi-invasive (expose die, no contact).

Section I

Timing Attacks

When execution time reveals your secrets

Timing Attacks — Fundamentals

First described by Paul Kocher (1996). If a cryptographic operation takes different amounts of time depending on the secret key, an attacker can statistically recover the key by measuring execution times.


Common Timing Leaks

  • Conditional branches on secret bits (if key_bit → multiply, else skip)
  • Early-exit comparisons (memcmp returns at first mismatch)
  • Data-dependent table lookups (cache hits vs. misses)
  • Variable-time multiplication (leading zeros skip)

Classic: RSA Square-and-Multiply

# VULNERABLE — timing leaks key bits
def modexp_vulnerable(base, exp, mod):
    result = 1
    for i in range(exp.bit_length()-1, -1, -1):
        result = (result * result) % mod   # always square
        if (exp >> i) & 1:                 # branch on secret!
            result = (result * base) % mod # multiply only if bit=1
    return result
# Multiply is ~2x slower than square-only
# → attacker times each iteration
# → recovers every bit of the exponent (secret key)

Fix: Always execute both operations — discard unwanted result via constant-time select.

Constant-Time String Comparison

❌ Vulnerable (early-exit)

// Standard memcmp — LEAKS position of first difference
int memcmp(const void *a, const void *b, size_t n) {
    const uint8_t *p = a, *q = b;
    for (size_t i = 0; i < n; i++) {
        if (p[i] != q[i])
            return p[i] - q[i];  // exits early!
    }
    return 0;
}
// Timing reveals how many bytes match
// → attacker brute-forces one byte at a time
// → O(256 × n) instead of O(256^n)

✅ Constant-Time

// Crypto-safe comparison — always same time
int ct_memcmp(const void *a, const void *b, size_t n) {
    const volatile uint8_t *p = a, *q = b;
    uint8_t diff = 0;
    for (size_t i = 0; i < n; i++) {
        diff |= p[i] ^ q[i]; // accumulate diffs
    }
    // No early exit, no branch on secret data
    return diff; // 0 if equal, nonzero otherwise
}
// Runs in exactly the same time regardless
// of where (or whether) a mismatch occurs

Used in OpenSSL's CRYPTO_memcmp(), libsodium's sodium_memcmp(), and Go's crypto/subtle.ConstantTimeCompare().

Constant-Time Conditional Select

Replace every secret-dependent branch with branchless bitwise arithmetic:

❌ Branching on secret

// Branch predictor leaks which path taken
if (secret_bit) {
    result = value_a;
} else {
    result = value_b;
}

✅ Branchless select

// Constant-time conditional select
// mask = 0xFFFFFFFF if bit=1, 0x00000000 if bit=0
uint32_t mask = -(uint32_t)secret_bit;
result = (value_a & mask) | (value_b & ~mask);
// Both values computed, correct one selected
// No branch, no timing difference

Rules for Constant-Time Cryptographic Code

1. No secret-dependent branches
2. No secret-dependent memory indices
3. No secret-dependent loop bounds
4. No variable-time arithmetic (div, mod)
5. No early-exit comparisons
6. Avoid compiler optimizations that reintroduce branches
7. Verify with tools: ct-verif, dudect, timecop
8. Use volatile to prevent dead-code elimination
9. Test on actual target hardware

Section II

Power Analysis Attacks

Reading secrets from current consumption

Power Consumption in CMOS

In CMOS logic, dynamic power is consumed when transistors switch state. The power drawn by a gate depends on the Hamming weight and Hamming distance of the data it processes.


Pdynamic = α · CL · Vdd² · f

where α = switching activity (data-dependent!), CL = load capacitance, Vdd = supply voltage, f = clock frequency.


Key insight: α changes with data values. Processing 0xFF0x00 (8 bits flip) draws more power than 0xFF0xFE (1 bit flip). This creates a measurable correlation between data and power.

Power Models

Hamming Weight Model

Power ∝ number of 1-bits in the value being processed.

P ≈ a · HW(v) + b + noise

Best for: bus loads, register writes where prior state ≈ 0.

Hamming Distance Model

Power ∝ number of bit transitions between consecutive values.

P ≈ a · HD(vold, vnew) + b + noise

Best for: registers, data buses with known prior value.

Simple Power Analysis (SPA)

SPA extracts secrets by directly inspecting a single (or few) power traces. Different operations produce visually distinct power signatures.


RSA via SPA

Square-and-multiply RSA: a square operation has a shorter, lower-amplitude power pattern than a multiply. By reading the trace visually:

S M S S M S M S S S M ...
↓  ↓  ↓  ↓  ↓  ↓  ↓  ↓  ↓  ↓  ↓
1  1  0  0  1  0  1  0  0  0  1 ...

Each multiply reveals a 1 bit; square-only reveals 0. A single trace can recover a 2048-bit RSA private key.

Simulated SPA Trace

Simulated RSA exponentiation power trace. S = square, SM = square + multiply. Key bits readable directly from amplitude pattern.

Countermeasure: Montgomery Ladder

Always perform both square and multiply, regardless of key bit. Identical operation sequence for 0 and 1 — SPA sees no difference.

Differential Power Analysis (DPA)

Introduced by Paul Kocher, Joshua Jaffe, and Benjamin Jun (1999). Uses statistical analysis across many traces to extract keys even when individual traces are noisy.

1. Collect N
power traces
N ≈ 1,000–50,000
2. Guess a key
sub-byte
256 hypotheses
3. Predict power
for each guess
HW/HD model
4. Correlate
prediction vs. trace
Pearson ρ
5. Correct guess
= highest ρ
KEY RECOVERED

AES Key Recovery — The Attack

Target: output of first-round SubBytes, S[p[i] ⊕ k[i]]. For each key byte hypothesis k*, compute HW of S[p[i] ⊕ k*] for all traces. Correlate against measured power at the SubBytes clock cycle. The correct k* yields the highest Pearson correlation. Repeat for all 16 bytes → full 128-bit key in 16 × 256 = 4,096 correlation tests.

Correlation Power Analysis (CPA)

Refinement of DPA using Pearson correlation coefficient instead of difference-of-means. More statistically efficient — requires fewer traces. Standard attack methodology today.

ρ(t, k*) = Cov(Hk*, Tt) / (σH · σT)

DPA / CPA — Interactive Demonstration

Simulated CPA correlation traces for an AES-128 SubBytes attack. Click "Run CPA" to see how the correct key byte (hypothesis 0xAB) rises above the wrong guesses as trace count increases.

Ready — click Run CPA

Cyan = correct key hypothesis (0xAB). Grey = wrong hypotheses (sample of 20). X-axis = number of traces. Y-axis = |Pearson ρ|.

Template Attacks

The most powerful form of profiled side-channel attack (Chari et al., 2002). Given access to an identical template device, an attacker can build a statistical model and then break a target device with as few as a single trace.


Two-Phase Process

Phase 1: Profiling

On a device you control, run encryptions with known keys. For each key byte value, record many traces and compute the multivariate Gaussian distribution (mean vector μ, covariance matrix Σ) of the power at selected "points of interest."

Phase 2: Matching

Capture trace(s) from the target device. Evaluate the log-likelihood of each template. The key hypothesis with maximum likelihood is the recovered key.

Mathematical Model

P(trace | key = k) = N(μk, Σk)
k* = argmaxk log P(trace | key = k)

Deep Learning Extensions

Modern attacks replace the Gaussian template with neural networks (CNNs, MLPs). Benefits:

  • No need to manually select points of interest
  • Handle misaligned traces (jitter) automatically
  • Cross-device attacks: train on device A, attack device B — demonstrated to succeed in as few as a single trace
  • Effective even against first-order masking

Section III

Electromagnetic Emanations

Secrets broadcast over the air

Electromagnetic Side-Channel Attacks

Current flow in IC wires produces electromagnetic radiation. A near-field EM probe (loop antenna, 50–500 μm) placed near the chip surface can capture signals that correlate with internal data processing.


Advantages Over Power Analysis

  • Spatial resolution — probe individual circuit blocks, not just global power rail
  • Non-contact — no need to modify power supply lines
  • Bypass decoupling capacitors — EM radiates directly from die, unfiltered
  • Works through packaging — even epoxy-encapsulated chips

SEMA = Simple EM Analysis (like SPA). DEMA = Differential EM Analysis (like DPA). CEMA = Correlation EM Analysis (like CPA). Same statistical techniques, different physical observable.

Attack Setup

📡  EM Probe → 🔧 LNA → 📊 Oscilloscope → 💻 CPA Software

Near-field H-probe → Low-noise amplifier → High-speed ADC → Statistical analysis

STELLAR Countermeasure

Route the crypto core exclusively within lower metal layers (M1–M4), then surround it with a Current-Domain Signature Attenuation (CDSA) shield in upper metals. The correlated EM signature is suppressed by 100–1000× before reaching the chip surface.


Cross-Device Deep Learning (X-DeepSCA)

Train a CNN on EM traces from Device A, attack Device B — same architecture, different manufacturing lot. Demonstrated to extract AES-256 keys with practical trace counts, proving device-specific noise is not a reliable defense.

Section IV

Cache & Microarchitectural Attacks

Software-only side channels — no oscilloscope required

Cache Timing Attacks

Modern CPUs use multi-level caches (L1: ~1 ns, L2: ~5 ns, L3: ~15 ns, DRAM: ~60 ns). These timing differences are a side channel when crypto code performs secret-dependent memory accesses.


AES T-Table Attack (Bernstein 2005)

Standard AES implementations use lookup tables (T-tables, 4 × 1 KB). The index into each table depends on plaintext ⊕ key. An attacker on the same machine measures which cache lines were accessed, revealing which table entries were used, directly leaking key-dependent information.

T-table index = S-box[pi ⊕ ki]
cache line = index >> 4 → leaks 4 MSBs of (pi ⊕ ki)

Major Cache Attack Techniques

Flush+Reload

Shared memory required. Attacker flushes a cache line, waits for victim, then reloads — fast reload = victim accessed it. Granularity: cache line (64 bytes).

Prime+Probe

No shared memory needed. Attacker fills cache set with own data, waits, then re-accesses — slow access = victim evicted a line. Works across VMs on shared L3.

Evict+Time

Evict specific cache sets, time victim's total execution. Variation in time reveals which sets victim accesses.

Spectre & Meltdown

Spectre (2018)

Exploits speculative execution and branch prediction. The CPU speculatively executes instructions beyond a branch before the condition is resolved. Even though results are rolled back architecturally, cache state changes persist and are observable via timing.

Variant 1 (Bounds Check Bypass): Trick victim into speculatively accessing out-of-bounds memory → data encoded into cache → Flush+Reload to read it.

Variant 2 (Branch Target Injection): Poison branch predictor to redirect speculative execution to attacker-chosen gadget.

Meltdown (2018)

Exploits out-of-order execution with delayed exception handling. CPU executes a kernel memory read before the privilege check completes. The data reaches the cache before the fault is raised.

Impact: Read arbitrary kernel memory from userspace. All kernel address space — including every process's passwords, keys, and secrets — accessible to any unprivileged process.


Mitigations

KPTI (unmap kernel pages), retpoline (replace indirect branches), IBRS/IBPB (hardware predictor barriers), DAWG (cache partitioning), microcode patches. Performance cost: 5–30% on some workloads.

Section V

Fault Injection Attacks

Active attacks — inducing errors to extract secrets

Fault Injection — Methods

Voltage Glitching

Briefly drop/spike Vdd to violate setup/hold timing. Causes bit flips in registers or skipped instructions. Cost: < $100 (ChipWhisperer). Requires removal of decoupling capacitors for best results.

Practicality: Very High

🕐

Clock Glitching

Insert an extra short clock pulse or shorten a cycle. CPU reads register before stable — causes computation errors. Cost: < $200. Temporal precision ~0.17 ns achievable on FPGA-based injectors.

Practicality: Very High

📡

EM Fault Injection

High-current EM pulse from a coil induces local currents in the die. Combines spatial targeting with moderate cost. Does not require board modification. Cost: ~$1–5K.

Practicality: High

💥

Laser Fault Injection

IR laser focused on exposed die creates electron-hole pairs → bit flips in SRAM/logic. Highest precision (single-transistor targeting). Requires chip decapsulation. Cost: $30–100K.

🔨

RowHammer (Software FI)

Repeated DRAM row activations cause charge leakage in adjacent cells → bit flips. No physical access needed — purely software. Demonstrated: privilege escalation, TLS key extraction, VM escape. Affects DDR3/DDR4; mitigations in DDR5 (TRR) partially bypassable.

Differential Fault Analysis (DFA)

Introduced by Biham & Shamir (1997). Inject a fault during encryption, then compare the correct and faulty ciphertexts. The difference reveals information about the secret key.


DFA on AES — Piret & Quisquater (2003)

Inject a single-byte fault at the input of round 8 MixColumns. This fault propagates through rounds 9 and 10, affecting 4 bytes of ciphertext. By analyzing the difference between correct and faulty outputs:

  • One faulty ciphertext yields ~28 possible key hypotheses for 4 key bytes
  • Two faults at different positions → unique 32-bit key fragment
  • 2 fault pairs (4 total faults) → complete AES-128 round key → full key via key schedule inversion

DFA on RSA-CRT

Bellcore Attack (Boneh et al., 1997): If RSA uses CRT and a fault occurs in one of the two modular exponentiations:

ŝ = correct mod p, faulty mod q
gcd(ŝe − m, N) = p → factorization of N!

A single faulty signature completely breaks RSA. This is why RSA-CRT implementations must verify signatures before output.


Countermeasures

Redundant computation — compute twice, compare results
Infection — if fault detected, corrupt output completely
Temporal redundancy — re-execute and compare
Error-detecting codes — parity/CRC on intermediate values
Environmental sensors — voltage, clock, temperature monitors

Section VI

Countermeasures

Defending implementations against physical leakage

Masking — The Primary Algorithmic Defense

Masking splits every sensitive intermediate value into d+1 random shares. Each share is statistically independent of the secret. An attacker must combine information from all shares simultaneously to learn anything — requiring (d+1)th-order analysis.


Boolean Masking

x = x₀ ⊕ x₁ ⊕ ... ⊕ xd

Each share xi (for i > 0) is uniformly random. x₀ = x ⊕ x₁ ⊕ ... ⊕ xd. Any strict subset of shares is uniformly distributed — reveals zero information about x.


Arithmetic Masking

x = x₀ + x₁ + ... + xd (mod 2n)

Used for ARX ciphers (SPECK, ChaCha20). Converting between Boolean and arithmetic masking requires dedicated secure algorithms.

Masking AES SubBytes

# First-order masked SubBytes (d=1)
def masked_subbytes(x0, x1):
    """x = x0 ⊕ x1 (true value)"""
    # Re-mask: generate fresh mask
    r = random_byte()
    # Pre-compute masked S-box table
    # S'[a] = S[a ⊕ x1] ⊕ r  (for all a)
    masked_sbox = [sbox[a ^ x1] ^ r
                   for a in range(256)]
    # Apply: lookup with share x0
    y0 = masked_sbox[x0]
    y1 = r  # new mask
    # Now y0 ⊕ y1 = S[x0 ⊕ x1] = S[x]
    return y0, y1

Security vs. Order

dth-order masking protects against attacks combining up to d intermediate values. Cost grows as O(d²) in computation and O(d) in randomness. Typical: d=1 (first-order) for smart cards, d=2–3 for high-assurance.

Threshold Implementations (TI)

Masking in hardware must handle glitches — transient signal transitions that can combine shares within a single clock cycle, leaking information even if the steady-state design is secure.


TI Properties (Nikova, Rechberger, Rijmen 2006)

1. Correctness

Combining output shares yields the correct function output.

2. Non-Completeness

Each output share depends on at most d input shares (i.e., at least one share is missing from every output computation). This ensures glitches within a single combinational block cannot recombine all shares.

3. Uniformity

Output shares are uniformly distributed if input shares are. Required for composability of multiple TI stages.

TI-Protected AND Gate (d=1, 3 shares)

// Inputs: a = (a0, a1, a2), b = (b0, b1, b2)
// where a = a0 ⊕ a1 ⊕ a2, b = b0 ⊕ b1 ⊕ b2
// Output shares of c = a AND b:

assign c0 = (a0 & b0) ^ (a0 & b1) ^ (a1 & b0);
assign c1 = (a1 & b1) ^ (a1 & b2) ^ (a2 & b1);
assign c2 = (a2 & b2) ^ (a2 & b0) ^ (a0 & b2);
// c0 ⊕ c1 ⊕ c2 = a & b  ✓
// Each c_i depends on only 2 of 3 input shares ✓
// Non-completeness: c0 has no a2,b2 etc.

Evolution

  • TI → first-order glitch-resistant, limited to low algebraic degree
  • DOM (Domain-Oriented Masking) → arbitrary order, less area/randomness than TI
  • HPC2 / PINI → provably composable at arbitrary order, multivariate secure
  • AGEMA → automated tool to convert unprotected HDL → masked implementation

Hiding Countermeasures

Instead of removing the correlation (masking), hiding makes the leakage harder to exploit by adding noise or randomizing timing.

Amplitude Hiding

  • Dual-rail logic (WDDL, SABL): every gate computes both f and f̄ — power is constant regardless of data
  • Current flattening: on-chip voltage regulator with fast feedback loop to suppress data-dependent current variations
  • Decoupling & filtering: large on-die capacitors smooth supply current
  • CDSA / ASNI: signature attenuation hardware embeds crypto core, suppresses correlated signal at power/EM ports

Temporal Hiding

  • Random delays: insert variable-length dummy operations between rounds (degrades alignment of traces)
  • Shuffling: randomize execution order of independent operations (e.g., AES processes 16 S-boxes in random order each encryption)
  • Random clock jitter: hardware clock with random frequency variation
  • Attacker can counter with trace alignment algorithms — works best combined with masking

Physical Shielding

  • Active mesh / shield: metal layers carrying random signals over the crypto core; tampering breaks the mesh → device zeroizes
  • EM shielding: Faraday cage, on-chip shield metals routed above sensitive logic
  • Tamper-detect sensors: voltage, temperature, light, laser detectors that trigger key erasure
  • Used in: smart cards (EMV), HSMs, military crypto

Constant-Time Programming in Practice

The Compiler Problem

You write constant-time C. The compiler "optimizes" it back into branching code. This is a fundamental tension — compilers are designed to make code fast, not constant-time.

// You write:
uint32_t mask = -(uint32_t)(a == b);
result = old ^ (mask & (old ^ new_val));

// Compiler may emit:
//   CMP a, b
//   JNE skip      ← branch reintroduced!
//   MOV result, new_val
// skip:

Defenses

  • Use volatile to prevent optimization
  • Write in assembly for critical paths
  • Use -O0 or specific compiler barriers
  • Verify final binary with tools like ct-verif, Binsec/Rel

Libraries That Get It Right

libsodium / NaCl — Designed constant-time from the ground up. Curve25519 in 32-bit arithmetic, no table lookups, no secret-dependent branches.

OpenSSL (post-2014) — Constant-time BN operations, bitsliced AES, blinded RSA. Legacy codepaths still occasionally vulnerable.

BoringSSL — Google's fork with aggressive constant-time discipline. All crypto ops verified.

AES-NI — Intel hardware AES instructions are inherently constant-time (no table lookups, fixed latency). Eliminates cache-timing attacks on AES entirely.


Verification Tools

ct-verif dudect timecop Binsec/Rel CacheAudit ct-wasm

Test Vector Leakage Assessment (TVLA)

TVLA (Goodwill et al., 2011) is a standardized methodology to detect whether a device leaks side-channel information — without performing a full key-recovery attack.


The t-Test Approach

Collect two sets of power traces:

  • Set A: encrypt a fixed plaintext repeatedly
  • Set B: encrypt random plaintexts

At each time sample, compute Welch's t-statistic between Set A and Set B.

t = (μA − μB) / √(s²A/nA + s²B/nB)

If |t| > 4.5 at any time point → statistically significant leakage detected (p < 10−5). The device fails TVLA.

Interpreting Results

Simulated TVLA t-test. Red horizontal lines = ±4.5 threshold. Cyan trace = t-value over time. Spikes exceeding threshold indicate leakage points.


Higher-Order TVLA

For masked implementations, apply preprocessing before the t-test: square the centered trace values (second-order), or compute products of pairs of samples. If the second-order t-test shows |t| > 4.5 on a first-order masked design, the masking is broken. Extend to nth-order for nth-order masking evaluation.

Section VII

Hardware Countermeasures

Silicon-level defenses for crypto accelerators

SoC-Level Side-Channel Defenses

On-Chip Noise & Attenuation

Signature Attenuation Hardware (SAH)

Embed crypto core within a dedicated power regulation block. The SAH absorbs the data-dependent current signature internally, presenting only a flat (or noise-dominated) current profile to the external power pin. Demonstrated 100–1000× increase in traces-to-disclosure on AES-256.

On-Die Voltage Regulators

Integrated LDO or switched-capacitor regulator with fast transient response. Supply current drawn from external pin is filtered — attacker sees regulator's signature, not crypto engine's.

Random Number Generators

High-quality TRNG essential for masking, shuffling, blinding. Common: ring-oscillator based, metastable latch, or SRAM PUF-derived entropy. Must pass NIST SP 800-90B health tests.

Tamper Detection & Response

Environmental Sensors

Voltage glitch detectors (under/overvoltage), clock frequency monitors, temperature sensors, light sensors (detect decapsulation/laser). On anomaly: halt operation, zeroize keys, alert host.

Active Shield Mesh

Top metal layers carry a random bit pattern continuously checked by the security controller. Any physical probing or FIB (focused ion beam) edit that cuts a mesh wire triggers immediate key zeroization. Used in EMV chips, HSMs, Apple Secure Enclave.

Secure Boot & Key Storage

Keys stored in OTP/eFuse or SRAM PUF — never exposed on bus. Crypto operations execute inside a hardened island with dedicated clock, power, and bus isolation. ARM TrustZone CryptoCell, Intel AES-NI, dedicated secure elements (SE050, ATECC608).

Cost of Protection — Area & Performance

ImplementationOrderArea (GE)ThroughputRandomness / encTraces to break
AES-128 (unprotected)~2,40054 cycles0 bits~500
AES-128 + 1st-order TId=1~7,500266 cycles~48 bits~50,000
AES-128 + 1st-order DOMd=1~5,800~108 cycles~18 bits/round~50,000
AES-128 + 2nd-order DOMd=2~11,000~216 cycles~54 bits/round~108
AES-128 + shuffling + maskd=1~8,200~160 cycles~128 bits~106
AES-128 + SAH wrapper~4,000 + SAH54 cycles0 bits> 107

GE = gate equivalents (1 GE ≈ 2-input NAND). Cycle counts for round-based architectures at 1 round/cycle. Traces-to-break assumes first-order CPA attack on the respective protection level. Values are approximate — vary with technology node, implementation style, and attack sophistication.

Section VIII

Real-World Case Studies

When theory meets silicon

Notable Side-Channel Attacks in History

EMV Smart Cards (2000s)

DPA on DES/3DES implementations in banking smart cards recovered PIN encryption keys. Led to mandatory SPA/DPA resistance testing in Common Criteria (EAL4+ with AVA_VAN.5). Modern EMV chips use masking + active shields + environmental sensors.

Power Analysis

OpenSSL RSA via Cache (2005)

Bernstein demonstrated timing-based key extraction from OpenSSL AES over a network. Percival extracted RSA keys via L1 cache contention on hyperthreaded Intel CPUs. Led to constant-time bignum operations and AES-NI adoption.

Cache Timing

TEMPEST / Van Eck (1985–)

NSA's classified program (later declassified) demonstrated reconstruction of CRT monitor images from EM emanations at distance. Van Eck published civilian version in 1985. Modern extensions: recovering LCD content, keyboard keystrokes from EM.

EM Emanation

Spectre / Meltdown (2018)

Affected virtually all modern CPUs (Intel, AMD, ARM). Spectre exploited branch prediction; Meltdown exploited out-of-order execution. Industry-wide response: OS patches, microcode updates, CPU redesigns. Performance impact: 5–30%.

Microarchitectural

Plundervolt / VoltPillager (2019–21)

Software-controlled undervolting (via DVFS interface) induced faults inside Intel SGX enclaves, breaking AES-NI integrity. VoltPillager showed hardware voltage injection bypassed all software mitigations. Led to Intel locking DVFS in SGX mode.

Fault Injection

ROCA — Infineon TPM (2017)

RSA key generation in Infineon TPMs used a flawed fast-prime algorithm. Side-channel analysis revealed the structure, allowing factorization of 2048-bit RSA keys from the public key alone. Affected millions of Estonian ID cards, YubiKeys, and TPM-protected machines.

Algorithmic + SC

Defense in Depth — Layered Protection

No single countermeasure is sufficient. Real-world secure implementations combine multiple layers:

Algorithm Design
SW Implementation
HW Countermeasures
Physical
Testing

Algorithm Layer

• Algorithms with inherent resistance (e.g., Keccak's sponge has no S-box lookup)
• Avoid RSA-CRT without verification
• Prefer X25519 over NIST P-256 (easier to implement in constant time)

Software Layer

• Constant-time code discipline
• Boolean/arithmetic masking
• Shuffling of independent operations
• Blinding (randomize inputs to foil DPA)
• Verified with ct-verif / dudect

Hardware + Physical Layer

• Threshold implementations in RTL
• On-chip noise injection / SAH
• Environmental sensors & active mesh
• TVLA validation at silicon bring-up
• Common Criteria / FIPS 140-3 certification

Blinding — Protocol-Level Defense

RSA Blinding

Randomize the input to the private-key operation so the attacker cannot correlate known inputs with power/timing traces.

# RSA signature with blinding
def rsa_sign_blinded(m, d, n):
    # Generate random blinding factor
    r = random_coprime(n)
    r_inv = modular_inverse(r, n)
    
    # Blind the message
    m_blind = (m * pow(r, e, n)) % n
    
    # Sign the blinded message
    s_blind = pow(m_blind, d, n)  # leaks info about r, not d!
    
    # Unblind the signature
    s = (s_blind * r_inv) % n
    return s  # correct signature, no leakage

Every invocation uses a fresh random r, so the intermediate values during exponentiation are completely randomized. Attacker's traces are decorrelated from the actual key bits.

ECC Scalar Blinding

For elliptic curve scalar multiplication k·G:

Scalar Blinding

Replace k with k + r·n where n is the group order and r is random. Since n·G = O (point at infinity), the result is unchanged: (k + r·n)·G = k·G. But the scalar bits seen by the multiplier are randomized.

Point Blinding

Add a random point R to the input, compute k·(G + R), then subtract k·R. Requires precomputation of k·R via a separate (protected) path.


Projective Coordinate Randomization

Represent point (X, Y, Z) as (λX, λY, λZ) for random λ. Same projective point, but all intermediate coordinate values are randomized. Near-zero overhead.

Summary & Key Takeaways

Attack Surface

Every physical implementation leaks information through time, power, EM, cache state, sound, and fault responses. Mathematical security guarantees do not automatically transfer to real hardware/software.

Passive Attacks

SPA/DPA/CPA extract keys by observing power consumption. Cache timing attacks (Flush+Reload, Spectre) work purely in software. Template attacks and deep-learning SCA approach single-trace key recovery.

Active Attacks

Voltage/clock glitching, EMFI, and laser FI induce faults for DFA. RowHammer is a software-only fault injection. A single faulty RSA-CRT signature breaks the key.

Countermeasures

Masking (Boolean, arithmetic, threshold implementations) removes statistical dependence. Hiding (shuffling, dual-rail logic, noise injection) reduces SNR. Constant-time code eliminates timing channels. Blinding randomizes protocol inputs. Hardware sensors detect fault injection.

Defense in Depth

No single technique is foolproof. Production systems layer masking + hiding + constant-time + blinding + hardware sensors + TVLA testing. Certification standards (Common Criteria, FIPS 140-3, EMVCo) mandate multi-layer SCA resistance.

Ongoing Arms Race

Deep-learning SCA, cross-device attacks, and microarchitectural exploits continue to advance. Post-quantum algorithms (ML-KEM, ML-DSA) introduce new SCA surfaces. The field remains an active frontier of both attack and defense research.

References & Further Reading

Foundational Papers

• Kocher, "Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems" (1996)
• Kocher, Jaffe, Jun, "Differential Power Analysis" (1999)
• Chari, Rao, Rohatgi, "Template Attacks" (CHES 2002)
• Biham & Shamir, "Differential Fault Analysis" (1997)
• Boneh, DeMillo, Lipton, "On the Importance of Checking Crypto Computations" (1997)
• Piret & Quisquater, "A DFA Technique Against SPN Structures" (2003)
• Bernstein, "Cache-Timing Attacks on AES" (2005)

Spectre / Meltdown

• Kocher et al., "Spectre Attacks: Exploiting Speculative Execution" (2018)
• Lipp et al., "Meltdown: Reading Kernel Memory from User Space" (2018)

Countermeasures

• Nikova, Rechberger, Rijmen, "Threshold Implementations Against Side-Channel Attacks and Glitches" (2006)
• Ishai, Sahai, Wagner, "Private Circuits: Securing Hardware Against Probing Attacks" (2003)
• Das & Sen, "EM and Power SCA: Advanced Attacks and Low-Overhead Countermeasures" (2020)
• Schneider, Moradi, Güneysu, "Arithmetic Addition over Boolean Masking" (2015)
• Goodwill et al., "TVLA — A Testing Methodology for SCA Resistance" (2011)

Standards & Specifications

• NIST SP 800-90B (Entropy Sources)
• FIPS 140-3 (Cryptographic Module Validation)
• Common Criteria ISO/IEC 15408 (EAL4+ AVA_VAN.5)
• EMVCo Security Evaluation Standards