🔒
Modern Cryptography Series · Presentation 08

Fully Homomorphic
Encryption

Compute on ciphertext without decrypting — from Gentry's blueprint through CKKS, TFHE and hardware acceleration to real-world deployment
Lattices BGV · BFV · CKKS TFHE · FHEW Bootstrapping HW Acceleration
Arrow keys or swipe to navigate · Esc for overview · F for fullscreen

The Holy Grail of Cryptography

"Can we delegate computation to an untrusted party without revealing our data?"

🔴 Traditional Cloud

  • Encrypt data at rest & in transit
  • Must decrypt to compute
  • Cloud provider sees plaintext
  • Data breaches expose secrets

🟢 FHE Cloud

  • Client encrypts input locally
  • Server computes on ciphertext
  • Server never sees plaintext
  • Client decrypts result only
Enc(x) ⊕ Enc(y) = Enc(x + y)    Enc(x) ⊗ Enc(y) = Enc(x · y)

FHE supports arbitrary functions over encrypted data — both addition and multiplication, composable without limit

Homomorphic Encryption Taxonomy

ClassOperationsDepthExample
PHE — Partially+ or × (not both)UnlimitedPaillier, El Gamal
SHE — Somewhat+ and ×Limited LBGV (small circuits)
LHE — Levelled+ and ×Fixed L (no boot)BFV, CKKS, BGV (levelled)
FHE — Fully+ and ×UnlimitedBGV+Boot, TFHE, FHEW

Why depth matters

Each multiplication consumes a "level" of noise headroom. Without bootstrapping (refreshing the ciphertext), depth is bounded. FHE = LHE + bootstrapping.

Mathematical Foundations

Ring Learning With Errors, polynomial rings, and the noise growth that makes FHE both possible and hard

Learning With Errors (LWE)

The LWE Problem

Given many samples (aᵢ, bᵢ) where:

bᵢ = aᵢ · s + eᵢ (mod q)
  • s — secret vector ∈ ℤₙ
  • aᵢ — uniformly random ∈ ℤₙₙ
  • eᵢ — small error from χ (Gaussian)

Finding s is computationally hard — as hard as worst-case lattice problems (Regev 2005).

Why error?

Without noise, solving for s is trivial Gaussian elimination. The error makes the system overdetermined and noisy — no efficient algorithm known.

Security

Reducible from GapSVP and SIVP — hard even for quantum computers.

Ring-LWE — Compact & Efficient

Polynomial Ring

R = ℤ[x] / (xⁿ + 1)
  • n — power of 2 (512, 1024, 2048, 4096…)
  • Cyclotomic polynomial xⁿ + 1 gives good structure
  • Rq = R / qR for ciphertext modulus q

RLWE Sample

(a, b = a·s + e) ∈ Rq × Rq

One polynomial encodes n coefficients — RLWE packs n × more data per sample than plain LWE.

Why xⁿ + 1?

  • NTT-friendly — enables O(n log n) polynomial multiplication
  • Negacyclic convolution keeps coefficients bounded
  • Provably as hard as LWE (Lyubashevsky, Peikert, Regev 2010)

Parameters

nlog₂ qSecurity
102427~128-bit
204854~128-bit
4096109~128-bit
8192218~128-bit

Noise Growth — The Central Challenge

Noise after operations

  • Addition: noise grows linearly — eᵣ = e₁ + e₂
  • Multiplication: noise grows multiplicatively — eᵣ ≈ e₁ · e₂ · q

After L multiplications: noise ≈ (initial noise) × qᴸ

Decryption failure

Decryption requires |noise| < q/2. When noise exceeds this bound, decryption produces garbage. This is the hard ceiling on circuit depth for SHE/LHE.

Noise management techniques

  • Modulus switching — scale down q after each multiplication to reduce noise magnitude proportionally
  • Key switching — refresh keys after relinearisation (eliminates degree-2 ciphertexts)
  • Rescaling — CKKS variant: divide by Δ to absorb noise into precision loss
  • Bootstrapping — homomorphically decrypt to reset noise to fresh level

FHE Scheme Landscape

BGV, BFV, CKKS, TFHE, FHEW — design rationale, native plaintext space, and target workloads

Gentry's Blueprint (2009)

The Breakthrough

Craig Gentry (Stanford / IBM) proved the first FHE construction. Key insight: self-referential bootstrapping — use the scheme to evaluate its own decryption circuit.

FHE = SHE with depth(Cdec + 1) ≤ L

Squashing trick

Original SHE decryption was too deep. Gentry added a hint in the public key that reduces decryption to a sparse-subset-sum — shallow enough to evaluate homomorphically.

Bootstrapping in one line

  • Let ct = Enc(m) with high noise
  • Publish sk encrypted: ct_sk = Enc(sk)
  • Evaluate Decsk(ct) homomorphically using ct_sk
  • Result: fresh Enc(m) with reset noise

Historical impact

  • 2009: Gentry STOC (IDEAL lattices)
  • 2011: BGV — ring-based, practical
  • 2012: BFV — Brakerski full-RNS
  • 2017: CKKS — approximate arithmetic
  • 2019: TFHE — fast gate-level FHE

BGV & BFV — Integer Arithmetic

BGV (Brakerski-Gentry-Vaikuntanathan 2011)

  • Plaintext space: ℤₜ[x]/(xⁿ+1) — small integer polynomial ring modulo plaintext prime t
  • Noise management: modulus switching — drop one modulus level per multiplication
  • Multi-level ciphertext modulus: q = q₀·q₁·…·qₗ (RNS representation)
  • Efficient for levelled FHE; bootstrapping optional

BFV (Fan-Vercauteren / Brakerski 2012)

  • Plaintext space: same — ℤₜ[x]/(xⁿ+1)
  • Key difference: noise managed via scale invariance — divide by Δ = q/t after multiplication
  • Simpler modulus structure; single large q
  • Preferred for: integer lookup tables, exact arithmetic, database queries

BGV vs BFV — practical guidance

BGV is faster when circuit depth L is known at compile time (levelled mode). BFV's simpler structure makes it easier to implement and tune. Both are available in Microsoft SEAL, OpenFHE, and HElib. Choose BGV for deep circuits, BFV for shallow encrypted integer operations.

CKKS — Approximate Real Arithmetic

Cheon-Kim-Kim-Song (2017)

Treats noise as rounding error rather than decryption failure. Enables native floating-point-like operations over encrypted real (and complex) numbers.

Dec(ct) ≈ m (within precision Δ)
  • Encode real vector m via SIMD slots using CRT/FFT
  • Scaling factor Δ = 2³⁰ or 2⁶⁰ encodes the fractional precision
  • Rescaling after × absorbs noise into least-significant precision bits

CKKS SIMD packing

A single ciphertext encodes n/2 complex values simultaneously (Galois automorphism rotations allow inter-slot operations).

  • n = 8192 → 4096 slots
  • n = 16384 → 8192 slots
  • Throughput: millions of multiplications per second on encrypted real data

Best for

  • Neural network inference on private data
  • Private genomics (GWAS, PRS)
  • Encrypted statistics and ML training
  • Financial risk models

TFHE & FHEW — Fast Gate-Level FHE

FHEW (Ducas-Micciancio 2015)

  • Encrypts single bits (or small integers) using plain LWE
  • Bootstrapping takes ~100ms on CPU — fast enough to be practical
  • Key tool: ring GSW external product for blind rotation
  • Evaluate any Boolean gate and bootstrap simultaneously

TFHE (Chillotti et al. 2016/2020)

  • Torus LWE: coefficients in ℝ/ℤ (the torus 𝕋)
  • Bootstrapping in <10ms on CPU, <1ms on GPU
  • Any programmable bootstrapping: gate + function evaluation in one step
  • Used in: Concrete (Zama), OpenFHE

TFHE vs CKKS — choosing the right tool

PropertyTFHECKKS
PlaintextBits / small integersReals / complex (SIMD)
Per-gate bootYes — automaticNo — amortised over slots
Best forControl flow, exact logicML inference, statistics
Bootstrapping<10ms / gateSeconds / level refresh

GSW & the External Product

GSW Encryption (Gentry-Sahai-Waters 2013)

Encrypts message μ as a matrix C ≈ μ · G where G is a gadget matrix. Multiplication with another ciphertext uses the gadget decomposition:

C₁ ⊗ C₂ = C₁ · G⁻¹(C₂)

Noise grows linearly in the decomposition bound rather than multiplicatively — key to efficient bootstrapping.

Blind Rotation (FHEW/TFHE)

Bootstrap = evaluate f(x) = testVector[x mod N] homomorphically using a polynomial accumulator rotated by encrypted LWE coefficients:

  1. Initialise ACC = testVector · Xᵇ ∈ RLWE
  2. For each LWE coefficient aᵢ: ACC ← blind-rotate by aᵢ using RGSW(sᵢ)
  3. Extract LWE sample from ACC
  4. Key-switch back to original LWE key

Programmable bootstrapping

The testVector can encode any function f: ℤ_N → ℤ_N — including non-linear activations like ReLU — making TFHE natural for encrypted neural inference.

Bootstrapping — Noise Reset

CKKS Bootstrapping (Han-Ki 2020)

Three homomorphic steps:

  1. CoeffToSlot — linear map via DFT-like matrix
  2. EvalMod — sine approximation to wrap coefficients mod q (polynomial approximation of periodic function)
  3. SlotToCoeff — inverse linear map

Bootstraps n/2 slots in parallel. Throughput: ~10⁶ slot-bootstraps/sec on 1 CPU core.

Bootstrapping cost (CKKS)

nSlotsTime (CPU)
2¹⁴8192~2s
2¹⁵16384~6s
2¹⁶32768~25s

GPU and hardware acceleration reduce these by 100–1000× — see Hardware section.

Amortisation

Bootstrapping all n/2 slots simultaneously means cost-per-slot can be <1μs on optimised hardware, enabling practical encrypted ML inference.

Key Operations & Operations

Relinearisation, key switching, rotation, RNS representation and NTT — the building blocks of every FHE library

Relinearisation & Key Switching

The degree problem

Multiplying ciphertexts (c₀, c₁) × (c₀', c₁') produces a degree-2 ciphertext (d₀, d₁, d₂) — decryptable only with s² in the secret key. Each further multiply increases degree.

Relinearisation (Relin)

Uses a relinearisation key rlk = Enc(s²) to homomorphically eliminate d₂, returning to degree-1 at the cost of one extra inner product with rlk.

(d₀, d₁, d₂) + d₂ · rlk → (d₀', d₁')

Key Switching

Reencrypts a ciphertext under a different secret key. Uses a key switching key ksk = Enc_{sk_new}(sk_old). Required after:

  • Relinearisation (degree reduction)
  • Galois rotations (slot rotation)
  • Bootstrapping key change

Cost: n large polynomial multiplications — typically the bottleneck.

Galois Rotations

Apply automorphism x→x^(2k+1) to rotate SIMD slots cyclically. Required for inner product, convolution over plaintext slots.

RNS & NTT — The Computational Engine

Residue Number System (RNS)

Large ciphertext modulus q is decomposed as product of small NTT-friendly primes:

q = q₀ · q₁ · … · q_{L-1} (each ~60 bits)
  • All arithmetic stays within 64-bit machine words
  • Modular reduction is free — no multi-precision arithmetic
  • Modulus switching = drop one qᵢ, rescale
  • Used by: OpenFHE, SEAL, HElib (CKKS, BGV, BFV)

Number Theoretic Transform (NTT)

Polynomial multiplication in Rq via pointwise product in NTT domain:

a · b = INTT(NTT(a) ⊙ NTT(b))
  • O(n log n) vs O(n²) schoolbook
  • Each NTT: n · log₂n butterfly operations
  • Hardware: same butterfly datapath as post-quantum (ML-KEM, ML-DSA)
  • NTT is shared IP across FHE, PQC, ZK-SNARKs

Dominant cost breakdown

Key switching: 70% · NTT ops: 60% of key switching · Bandwidth: often the true bottleneck at scale

FHE Libraries & Ecosystem

Open-source FHE libraries, standards efforts (HE-Std), and the compiler toolchain from high-level program to encrypted circuit

FHE Library Landscape

LibrarySchemesLanguageFocus
Microsoft SEALBFV, CKKSC++17, PythonIndustry-grade, well-documented
OpenFHEBFV, BGV, CKKS, FHEW, TFHEC++17Research + production; successor to PALISADE
HElibBGV, CKKSC++IBM; first practical FHE library
HEAANCKKSC++Original CKKS reference (Seoul Nat'l Univ)
ConcreteTFHERust + PythonZama; compiler from Python to FHE circuit
LattigoBFV, BGV, CKKS, RLWEGoMultiparty FHE, cloud-native

HE-Std / HomomorphicEncryption.org

Industry consortium (Microsoft, IBM, Intel, Google, Stanford, MIT…) producing API standards and security parameter recommendations. Security estimates for RLWE at 128/192/256-bit levels — the basis for all library default parameters.

Code: CKKS Encrypted Dot Product

import seal
from seal import EncryptionParameters, scheme_type, SEALContext
from seal import KeyGenerator, Encryptor, Evaluator, Decryptor, CKKSEncoder

# ── Parameters ──────────────────────────────────────────────────────────────
parms = EncryptionParameters(scheme_type.ckks)
parms.set_poly_modulus_degree(8192)                  # n = 2^13 → 4096 slots
parms.set_coeff_modulus(seal.CoeffModulus.Create(
    8192, [60, 40, 40, 60]))                         # RNS chain: 60+40+40+60 bits
scale = 2**40                                        # Δ encodes ~12 decimal digits

ctx   = SEALContext(parms)
keygen   = KeyGenerator(ctx)
pk, sk   = keygen.create_public_key(), keygen.secret_key()
relin_ks = keygen.create_relin_keys()
gal_ks   = keygen.create_galois_keys()              # for rotations

enc  = Encryptor(ctx, pk)
evl  = Evaluator(ctx)
dec  = Decryptor(ctx, sk)
coder = CKKSEncoder(ctx)

# ── Encode and encrypt two real vectors ─────────────────────────────────────
a = [1.5, 2.3, 0.7, 4.1]
b = [0.5, 1.0, 2.0, 0.25]

pt_a = coder.encode(a, scale)
pt_b = coder.encode(b, scale)

ct_a = enc.encrypt(pt_a)
ct_b = enc.encrypt(pt_b)

# ── Compute encrypted dot product via multiply-then-rotate-and-add ──────────
ct_prod = evl.multiply(ct_a, ct_b)
evl.relinearize_inplace(ct_prod, relin_ks)
evl.rescale_to_next_inplace(ct_prod)               # scale: Δ² → Δ

# Sum all slots: accumulate with rotations
ct_sum = ct_prod
for i in [1, 2]:
    ct_rot = evl.rotate_vector(ct_prod, i, gal_ks)
    ct_sum = evl.add(ct_sum, ct_rot)

# ── Decrypt ─────────────────────────────────────────────────────────────────
pt_result = dec.decrypt(ct_sum)
result = coder.decode(pt_result)

print(f"Encrypted dot product ≈ {result[0]:.4f}")  # expect ≈ 4.13 (a·b summed)
print(f"Plaintext reference   = {sum(x*y for x,y in zip(a,b)):.4f}")

Uses Microsoft SEAL Python bindings. All arithmetic — multiply, relinearise, rescale, rotate — runs entirely on ciphertext. The server never sees a or b.

FHE Compiler Toolchain

The compilation problem

Mapping a high-level program to FHE requires:

  • Circuit extraction — replace branches, loops with arithmetic gates
  • Scheme selection — CKKS for reals, BFV/TFHE for integers/bits
  • Parameter generation — n, q chain, scale, levels sufficient for circuit depth
  • Operator scheduling — minimise bootstraps via level optimisation
  • Slot packing — SIMD batching across data items

Compiler projects

  • HEIR (Google, 2024) — MLIR-based FHE compiler; CKKS/CGGI backends, open source
  • Concrete (Zama) — Python to TFHE; transparent parameter selection
  • EVA (Microsoft) — CKKS vectoriser; automatic rescaling insertion
  • HECO — HElib-targeting, slot rotation optimiser
  • Cingulata / Parasol — Boolean circuit compilers for FHEW/TFHE

HEIR architecture

Source → MLIR high-level dialect → secret-type propagation → scheme selection → parameter synthesis → LLVM → CPU/GPU/FPGA

Hardware Acceleration

GPU, FPGA, and ASIC approaches — NTT units, key-switching datapaths, and the state of FHE silicon

GPU Acceleration

Why GPUs?

  • NTT and polynomial multiply are embarrassingly parallel — thousands of independent butterfly units map directly to GPU SIMD lanes
  • RNS decomposition: all qᵢ operations are independent → SIMD across RNS limbs
  • Ciphertext elements fit in L2/L3 (10–100 MB) — GPU HBM bandwidth avoids DRAM bottleneck

GPU libraries

  • cuFHE — CUDA TFHE bootstrapping; <1ms/gate
  • 100x — Nvidia CKKS on A100: 150× CPU speedup
  • OpenFHE-GPU — CUDA backend (RTX 3090)
  • PhantomFHE — CKKS CUDA; 25× SEAL CPU

Benchmarks (CKKS, n=2¹⁵)

PlatformNTT (ms)KeySwitch (ms)Boot (s)
CPU (single core)1238018
CPU (32-core)0.5140.8
RTX 30900.081.20.15
A100 SXM40.030.40.05

Sources: Bossuat et al. (2023), PhantomFHE (2024), estimated from literature

FPGA Acceleration

FPGA advantages for FHE

  • Custom bit-widths — 27-bit Barrett modular multipliers for each RNS prime exactly fit DSP slices
  • Deterministic latency — no OS jitter, predictable timing for real-time FHE services
  • Power efficiency — 5–10× better perf/W vs GPU for NTT workloads
  • Reconfigurability — adapt n, q chain, or scheme per workload

Architecture: NTT unit

  • Radix-2 DIT butterfly with configurable twiddle factor ROM
  • Fully pipelined: one butterfly per cycle at 300–500 MHz
  • Multi-bank memory: n/2 parallel butterfly lanes (VU13P: 128 lanes)

Published results

PlatformOperationThroughput
Xilinx Alveo U250NTT n=2¹⁴1.3M NTTs/s
Intel Stratix 10CKKS HomMul35ms
Xilinx VCU118TFHE boot6ms
Alveo U280BGV HomMul8ms

Key FPGA papers

  • HEAX (Microsoft, ISCA 2020) — CKKS on Stratix 10
  • CraterLake (MIT, ISCA 2022) — heterogeneous FHE FPGA
  • FAB (Berkeley, ISCA 2022) — bootstrapping FPGA

FHE ASIC — Dedicated Silicon

ASIC vs FPGA for FHE

  • ASIC: 10–100× better perf/area than FPGA due to custom memory hierarchy optimised for FHE data access patterns
  • On-chip SRAM eliminates DRAM bandwidth bottleneck (1–10 TB/s vs 2 TB/s for HBM)
  • Custom datapath: 128-bit or 256-bit SIMD over RNS limbs
  • Fixed cost: parameter changes require full respin

Published FHE ASICs / chips

  • F1 (MIT, MICRO 2021) — first FHE ASIC; CKKS 5400× over CPU; 6nm design point
  • BTS (SNU, ISSCC 2022) — CKKS bootstrapping on chip; 22nm; 2 GB SRAM
  • CraterLake (MIT, ISCA 2022) — generalised FHE accelerator
  • Poseidon (ETH, 2023) — TFHE accelerator; 5nm; <1ms gate-boot
  • Intel HERACLES — disclosed 2023; tiled NTT + key-switch engines

Bottleneck analysis — where silicon wins

FHE is fundamentally memory-bandwidth limited. Key switching materialises 20–50 GB of intermediate data per evaluation. ASICs embed SRAM close to NTT compute clusters, slashing off-chip traffic. At 6nm with 500 MB SRAM, F1 achieves >10 TB/s effective internal bandwidth.

Code: NTT Butterfly Unit (SystemVerilog)

// Cooley-Tukey NTT butterfly — negacyclic, Barrett reduction
// Parameters: N=poly_degree, Q=RNS prime (~60-bit), twiddle from ROM
module ntt_butterfly #(
  parameter int N     = 4096,
  parameter int Q     = 32'd1073479681,   // NTT-friendly prime: Q-1 divisible by 2N
  parameter int W     = 30               // Barrett k-factor log2
)(
  input  logic        clk, rst_n,
  input  logic        valid_in,
  input  logic [63:0] a_in, b_in,        // two polynomial coefficients
  input  logic [63:0] w,                 // twiddle factor W^i mod Q from ROM
  output logic        valid_out,
  output logic [63:0] a_out, b_out
);
  // Registered pipeline: 3 stages for multiply-reduce
  logic [127:0] wb_full;
  logic [63:0]  wb_red;    // w * b mod Q via Barrett
  logic         v1, v2;

  // Stage 1: multiply
  always_ff @(posedge clk) begin
    wb_full <= w * b_in;              // 64×64 → 128-bit
    v1      <= valid_in;
  end

  // Stage 2: Barrett reduction (128→64-bit mod Q)
  barrett_reduce #(.Q(Q), .K(W)) u_bar (
    .clk(clk), .in(wb_full), .out(wb_red), .valid_in(v1), .valid_out(v2)
  );

  // Stage 3: butterfly outputs
  always_ff @(posedge clk) begin
    a_out     <= (a_in + wb_red >= Q) ? a_in + wb_red - Q : a_in + wb_red;
    b_out     <= (a_in - wb_red + Q >= Q) ? a_in - wb_red : a_in - wb_red + Q;
    valid_out <= v2;
  end
endmodule

Fully pipelined 3-cycle butterfly. At 400 MHz with N=4096 and L=8 RNS limbs, a complete NTT forward pass takes ≈ 50k cycles = 125μs. Instantiate 64 butterflies per limb for 2μs NTT throughput.

Real-World Applications

Private ML inference, genomics, regulated finance, multi-party computation, and emerging use cases

Encrypted Neural Network Inference

Problem setup

A healthcare client wants to run a diagnostic ML model hosted by a cloud provider, without revealing patient data. The cloud should learn nothing about the input or output.

CKKS NN mapping

  • Linear layers: encrypted matrix-vector multiply via slot rotations + inner product
  • Activation (ReLU): polynomial approximation (Chebyshev) — degree 15–27 typically sufficient for ResNet/VGG
  • BatchNorm: absorbed into linear weights (pre-compiled)
  • Softmax: approximated; argmax via comparison circuits (expensive)

Published results

NetworkSchemeAccuracyLatency
LeNet-5 (MNIST)CKKS99.0%~2s CPU
ResNet-20 (CIFAR)CKKS91.8%~6min CPU
BERT (NLP, 1 layer)CKKS+TFHE96%~90s CPU
ResNet-20 (CIFAR)CKKS+GPU91.8%<30s A100

Practical toolkits

  • CryptoNets (Microsoft 2016) — first encrypted NN demo
  • nGraph-HE (Intel) — TensorFlow + SEAL
  • Zama Concrete ML — scikit-learn + TFHE

Genomics, Finance & Private Queries

Private Genomics

  • GWAS — genome-wide association study over encrypted SNP data: compute χ² statistics using encrypted integer BGV
  • PRS (Polygenic Risk Score) — encrypted inner product of genotype × weights (CKKS, one ciphertext per chromosome)
  • iDASH competition — annual benchmark; 2021 winner: PRS on 245k SNPs in 2 min (CKKS+GPU)
  • Microsoft SEAL used in Azure Genomics pilot (2021)

Regulated Finance

  • Private credit scoring — bank evaluates client's encrypted financial features without learning them (CKKS logistic regression)
  • Dark pool matching — encrypted order book matching; buyer/seller reveal nothing until match (TFHE comparisons)
  • Risk aggregation — Duality Technologies: FHE-based regulatory reporting across banks

Private database queries

  • PIR (Private Information Retrieval) — retrieve DB record without server knowing which row
  • CKKS + BFV for encrypted SQL-like aggregates
  • Spiral (Washington, 2022) — practical PIR at GB scale

FHE + Multi-Party Computation

Why combine?

FHE gives computation on encrypted data; MPC gives distributed trust — no single party holds the full decryption key. Together: verifiable, distributed encrypted computation.

Threshold FHE

  • Split secret key sk among N parties: sk = sk₁ + sk₂ + … + skₙ
  • Decrypt only with t-of-n parties cooperating (Shamir sharing)
  • Used in: Lattigo multiparty, OpenFHE multiparty extension
  • Applications: privacy-preserving ML training across hospitals with no central authority

FHE + ZK Proofs

Combine FHE with Zero-Knowledge Proofs to prove correctness of computation on encrypted data — server proves it executed the agreed function without cheating.

  • SNARK-based FHE verifiability (Fiore et al. 2014+)
  • Verifiable FHE (vFHE): prover generates succinct proof alongside FHE output
  • Emerging: NTT shared between FHE and PLONK/Groth16

Applications

  • Private federated learning with verified aggregation
  • Auditable encrypted cloud databases
  • Private on-chain smart contracts (FHE blockchains)

Security & Standardisation

Security models, RLWE hardness assumptions, the HE-Std parameter tables, and FHE in the NIST PQC transition

FHE Security Model

IND-CPA security

All FHE schemes above target IND-CPA (indistinguishability under chosen-plaintext attack). An adversary given Enc(m₀) and Enc(m₁) cannot determine which is which with non-negligible advantage.

Warning: Standard FHE is NOT IND-CCA2 — ciphertext malleability is inherent (homomorphic operations modify ciphertext). Use MACs or signatures to authenticate computation results.

CKKS "leakage" controversy

Li & Micciancio (2021) showed CKKS decryption reveals information beyond the intended plaintext when used in certain interactive protocols — approximate decryption leaks exact-ish values. Mitigations:

  • Smudging noise: add large noise before revealing decryption
  • Use CKKS only in non-interactive (client decrypts) settings
  • Li-Micciancio noise flooding countermeasure in OpenFHE

Quantum security

RLWE hardness is believed quantum-resistant — no known quantum speedup beyond quadratic for lattice problems (Shor's algorithm doesn't apply). Same assumption underlies ML-KEM/ML-DSA.

Standardisation Landscape

HomomorphicEncryption.org (HE-Std)

Community white paper (2018, updated 2021) produced by Microsoft, IBM, Intel, Google, academia. Defines:

  • Standardised API for BFV, BGV, CKKS, TFHE
  • Concrete security parameter sets at 128/192/256-bit levels
  • Security estimator tool (lattice-estimator)
  • Reference encoding conventions

ISO/IEC 18033-6 (2019)

International standard for partially homomorphic encryption (Paillier, El Gamal). FHE standardisation is ongoing — NIST has not yet issued FHE-specific FIPS, though NIST IR 8481 (2023) surveys FHE for government use.

FHEW/TFHE in NIST context

FHEW and TFHE build on the same RLWE/LWE assumptions as FIPS 203 (ML-KEM). As PQC migration proceeds, FHE parameter selection will align with NIST SP 800-186 lattice security levels.

lattice-estimator (Albrecht et al.)

Open-source Sage/Python tool for concrete security estimation of RLWE instances. Given (n, q, σ), outputs best-known attack costs (BKZ, uSVP, primal/dual hybrid). Used by all major FHE libraries to validate parameters.

Scheme Selector — Which FHE for Your Task?

Performance: FHE Today vs Tomorrow

OperationSchemeCPU (now)GPU A100ASIC target
HomMul (1 level)CKKS n=2¹⁵380ms0.4ms<0.1ms
Bootstrap (4096 slots)CKKS n=2¹⁴2s50ms<5ms
Gate bootstrapTFHE8ms0.8ms<0.1ms
ResNet-20 inferenceCKKS6 min30s<3s
PRS (245k SNPs)CKKS2 min5s<1s

1000× gap vs plaintext

Best-case CKKS HomMul is still ~1000× slower than plaintext float32 multiply. Hardware acceleration narrows this to ~10–50× for many workloads — acceptable for high-value, privacy-critical applications.

Trend: closing fast

  • 2009 (Gentry): theoretical only
  • 2012: hours per AES block
  • 2018: minutes per NN inference
  • 2022 (F1 ASIC): milliseconds
  • 2026+: real-time encrypted ML

Limitations & Open Problems

Current limitations

  • No native comparison — equality/order tests require expensive bit-decomposition or polynomial approximation in CKKS
  • No native branching — FHE circuits must be branch-free (TFHE mitigates via bit-level gates)
  • Large ciphertext expansion — CKKS expands 1 float to ~50–200 KB ciphertext
  • Key material size — evaluation keys (relin + galois) can be tens of GB for large n
  • Parameter tuning complexity — non-expert users struggle; compilers help but don't solve all cases

Open research problems

  • Efficient bootstrapping — target <1ms per CKKS bootstrap at high precision
  • Transciphering — convert AES-encrypted stream to FHE ciphertext homomorphically (reduces client bandwidth)
  • FHE-native NNs — train networks with polynomial activations that are FHE-efficient from the start
  • Multiparty FHE at scale — distributed decryption without trusted dealer
  • Functional encryption — FHE generalisation with restricted decryption per function

FHE Ecosystem (2025)

Commercial

  • Zama — Concrete / fhEVM; FHE blockchain, ML
  • Duality Technologies — OpenFHE commercialisation; finance/healthcare
  • Cornami — FHE streaming ASIC
  • Microsoft — SEAL, HEAX, Azure Confidential Computing FHE
  • Intel — HERACLES chip, HEXL acceleration library
  • IBM — HElib, homomorphic encryption services

Research groups

  • MIT CSAIL — F1, CraterLake, Cheetah
  • Stanford — Gentry group; FHE theory
  • Seoul Nat'l Univ — HEAAN/CKKS; BTS chip
  • EPFL — Lattigo; multiparty FHE
  • NIST / DHS — NIST IR 8481 FHE evaluation
  • ETH Zürich — Poseidon TFHE chip

Standards / consortia

  • HomomorphicEncryption.org — parameter standards
  • FHE.org — community & annual FHE.org conference
  • OpenFHE governance board

Summary & Future Directions

What we covered

  • Foundations — LWE, RLWE, noise growth, cyclotomic rings
  • Schemes — BGV, BFV, CKKS, TFHE/FHEW: tradeoffs and selection
  • Operations — relinearisation, key switching, RNS, NTT
  • Hardware — GPU (150×), FPGA, ASICs (F1, BTS, Poseidon)
  • Applications — private ML, genomics, finance, MPC+ZK
  • Limits — no branches, ciphertext expansion, key size

Watch this space

  • FHE-native AI chips — dedicated ASIC for real-time encrypted inference
  • FHE blockchains — fully private smart contracts (fhEVM)
  • Transciphering at scale — remove AES pre-processing overhead
  • NIST FHE standardisation — expected late 2020s
  • Quantum + FHE — FHE security assumptions robust to quantum; synergy with QKD?

Series connection

FHE builds on: 07 PQC (RLWE), 01 Finite fields (NTT), 10 Hardware (NTT ASICs), 09 Side-channel (power attacks on FHE chips)

References

Foundational papers

  • Gentry, "A Fully Homomorphic Encryption Scheme," Stanford PhD thesis (2009)
  • Regev, "On Lattices, LWE, Random Linear Codes…" STOC (2005)
  • Lyubashevsky, Peikert, Regev, "On Ideal Lattices and RLWE" EUROCRYPT (2010)
  • Brakerski, Gentry, Vaikuntanathan (BGV), "Fully Homomorphic without Bootstrapping" ITCS (2012)
  • Fan, Vercauteren (BFV), "Somewhat Practical Fully Homomorphic Encryption" (2012)
  • Cheon, Kim, Kim, Song (CKKS), "Homomorphic Encryption for Arithmetic of Approximate Numbers" ASIACRYPT (2017)
  • Ducas, Micciancio (FHEW), "FHEW: Bootstrapping Homomorphic Encryption in Less Than a Second" EUROCRYPT (2015)
  • Chillotti et al. (TFHE), "TFHE: Fast Fully Homomorphic Encryption over the Torus" J.Cryptology (2020)
  • Gentry, Sahai, Waters (GSW) CRYPTO (2013)
  • Li, Micciancio, "On the Security of Homomorphic Encryption on Approximate Numbers" EUROCRYPT (2021)

Hardware & systems

  • Reagen et al. (F1), MICRO 2021
  • Kim et al. (BTS), ISSCC 2022
  • Samardzic et al. (CraterLake), ISCA 2022
  • Agrawal et al. (FAB), ISCA 2022
  • Boura et al. (HEAX/Microsoft), ISCA 2020
  • Viand et al. (Poseidon), ETH 2023
  • Han, Ki, "Better Bootstrapping for CKKS" CT-RSA 2020
  • HE-Std White Paper, HomomorphicEncryption.org (2021)
  • NIST IR 8481 "Getting Ready for Post-Quantum…FHE" (2023)
  • Albrecht et al., "Estimate all the {LWE, NTRU} schemes!" SCN (2018) — lattice-estimator