🔒
Modern Cryptography Series · Presentation 08
Fully Homomorphic
Encryption
Compute on ciphertext without decrypting — from Gentry's blueprint through CKKS, TFHE and hardware acceleration to real-world deployment
Lattices
BGV · BFV · CKKS
TFHE · FHEW
Bootstrapping
HW Acceleration
Arrow keys or swipe to navigate · Esc for overview · F for fullscreen
← Back to series
The Holy Grail of Cryptography
"Can we delegate computation to an untrusted party without revealing our data?"
🔴 Traditional Cloud
- Encrypt data at rest & in transit
- Must decrypt to compute
- Cloud provider sees plaintext
- Data breaches expose secrets
🟢 FHE Cloud
- Client encrypts input locally
- Server computes on ciphertext
- Server never sees plaintext
- Client decrypts result only
Enc(x) ⊕ Enc(y) = Enc(x + y) Enc(x) ⊗ Enc(y) = Enc(x · y)
FHE supports arbitrary functions over encrypted data — both addition and multiplication, composable without limit
Homomorphic Encryption Taxonomy
| Class | Operations | Depth | Example |
| PHE — Partially | + or × (not both) | Unlimited | Paillier, El Gamal |
| SHE — Somewhat | + and × | Limited L | BGV (small circuits) |
| LHE — Levelled | + and × | Fixed L (no boot) | BFV, CKKS, BGV (levelled) |
| FHE — Fully | + and × | Unlimited | BGV+Boot, TFHE, FHEW |
Why depth matters
Each multiplication consumes a "level" of noise headroom. Without bootstrapping (refreshing the ciphertext), depth is bounded. FHE = LHE + bootstrapping.
Part I
Mathematical Foundations
Ring Learning With Errors, polynomial rings, and the noise growth that makes FHE both possible and hard
Learning With Errors (LWE)
The LWE Problem
Given many samples (aᵢ, bᵢ) where:
bᵢ = aᵢ · s + eᵢ (mod q)
- s — secret vector ∈ ℤₙ
- aᵢ — uniformly random ∈ ℤₙₙ
- eᵢ — small error from χ (Gaussian)
Finding s is computationally hard — as hard as worst-case lattice problems (Regev 2005).
Why error?
Without noise, solving for s is trivial Gaussian elimination. The error makes the system overdetermined and noisy — no efficient algorithm known.
Security
Reducible from GapSVP and SIVP — hard even for quantum computers.
Ring-LWE — Compact & Efficient
Polynomial Ring
R = ℤ[x] / (xⁿ + 1)
- n — power of 2 (512, 1024, 2048, 4096…)
- Cyclotomic polynomial xⁿ + 1 gives good structure
- Rq = R / qR for ciphertext modulus q
RLWE Sample
(a, b = a·s + e) ∈ Rq × Rq
One polynomial encodes n coefficients — RLWE packs n × more data per sample than plain LWE.
Why xⁿ + 1?
- NTT-friendly — enables O(n log n) polynomial multiplication
- Negacyclic convolution keeps coefficients bounded
- Provably as hard as LWE (Lyubashevsky, Peikert, Regev 2010)
Parameters
| n | log₂ q | Security |
| 1024 | 27 | ~128-bit |
| 2048 | 54 | ~128-bit |
| 4096 | 109 | ~128-bit |
| 8192 | 218 | ~128-bit |
Noise Growth — The Central Challenge
Noise after operations
- Addition: noise grows linearly — eᵣ = e₁ + e₂
- Multiplication: noise grows multiplicatively — eᵣ ≈ e₁ · e₂ · q
After L multiplications: noise ≈ (initial noise) × qᴸ
Decryption failure
Decryption requires |noise| < q/2. When noise exceeds this bound, decryption produces garbage. This is the hard ceiling on circuit depth for SHE/LHE.
Noise management techniques
- Modulus switching — scale down q after each multiplication to reduce noise magnitude proportionally
- Key switching — refresh keys after relinearisation (eliminates degree-2 ciphertexts)
- Rescaling — CKKS variant: divide by Δ to absorb noise into precision loss
- Bootstrapping — homomorphically decrypt to reset noise to fresh level
Part II
FHE Scheme Landscape
BGV, BFV, CKKS, TFHE, FHEW — design rationale, native plaintext space, and target workloads
Gentry's Blueprint (2009)
The Breakthrough
Craig Gentry (Stanford / IBM) proved the first FHE construction. Key insight: self-referential bootstrapping — use the scheme to evaluate its own decryption circuit.
FHE = SHE with depth(Cdec + 1) ≤ L
Squashing trick
Original SHE decryption was too deep. Gentry added a hint in the public key that reduces decryption to a sparse-subset-sum — shallow enough to evaluate homomorphically.
Bootstrapping in one line
- Let ct = Enc(m) with high noise
- Publish sk encrypted: ct_sk = Enc(sk)
- Evaluate Decsk(ct) homomorphically using ct_sk
- Result: fresh Enc(m) with reset noise
Historical impact
- 2009: Gentry STOC (IDEAL lattices)
- 2011: BGV — ring-based, practical
- 2012: BFV — Brakerski full-RNS
- 2017: CKKS — approximate arithmetic
- 2019: TFHE — fast gate-level FHE
BGV & BFV — Integer Arithmetic
BGV (Brakerski-Gentry-Vaikuntanathan 2011)
- Plaintext space: ℤₜ[x]/(xⁿ+1) — small integer polynomial ring modulo plaintext prime t
- Noise management: modulus switching — drop one modulus level per multiplication
- Multi-level ciphertext modulus: q = q₀·q₁·…·qₗ (RNS representation)
- Efficient for levelled FHE; bootstrapping optional
BFV (Fan-Vercauteren / Brakerski 2012)
- Plaintext space: same — ℤₜ[x]/(xⁿ+1)
- Key difference: noise managed via scale invariance — divide by Δ = q/t after multiplication
- Simpler modulus structure; single large q
- Preferred for: integer lookup tables, exact arithmetic, database queries
BGV vs BFV — practical guidance
BGV is faster when circuit depth L is known at compile time (levelled mode). BFV's simpler structure makes it easier to implement and tune. Both are available in Microsoft SEAL, OpenFHE, and HElib. Choose BGV for deep circuits, BFV for shallow encrypted integer operations.
CKKS — Approximate Real Arithmetic
Cheon-Kim-Kim-Song (2017)
Treats noise as rounding error rather than decryption failure. Enables native floating-point-like operations over encrypted real (and complex) numbers.
Dec(ct) ≈ m (within precision Δ)
- Encode real vector m via SIMD slots using CRT/FFT
- Scaling factor Δ = 2³⁰ or 2⁶⁰ encodes the fractional precision
- Rescaling after × absorbs noise into least-significant precision bits
CKKS SIMD packing
A single ciphertext encodes n/2 complex values simultaneously (Galois automorphism rotations allow inter-slot operations).
- n = 8192 → 4096 slots
- n = 16384 → 8192 slots
- Throughput: millions of multiplications per second on encrypted real data
Best for
- Neural network inference on private data
- Private genomics (GWAS, PRS)
- Encrypted statistics and ML training
- Financial risk models
TFHE & FHEW — Fast Gate-Level FHE
FHEW (Ducas-Micciancio 2015)
- Encrypts single bits (or small integers) using plain LWE
- Bootstrapping takes ~100ms on CPU — fast enough to be practical
- Key tool: ring GSW external product for blind rotation
- Evaluate any Boolean gate and bootstrap simultaneously
TFHE (Chillotti et al. 2016/2020)
- Torus LWE: coefficients in ℝ/ℤ (the torus 𝕋)
- Bootstrapping in <10ms on CPU, <1ms on GPU
- Any programmable bootstrapping: gate + function evaluation in one step
- Used in: Concrete (Zama), OpenFHE
TFHE vs CKKS — choosing the right tool
| Property | TFHE | CKKS |
| Plaintext | Bits / small integers | Reals / complex (SIMD) |
| Per-gate boot | Yes — automatic | No — amortised over slots |
| Best for | Control flow, exact logic | ML inference, statistics |
| Bootstrapping | <10ms / gate | Seconds / level refresh |
GSW & the External Product
GSW Encryption (Gentry-Sahai-Waters 2013)
Encrypts message μ as a matrix C ≈ μ · G where G is a gadget matrix. Multiplication with another ciphertext uses the gadget decomposition:
C₁ ⊗ C₂ = C₁ · G⁻¹(C₂)
Noise grows linearly in the decomposition bound rather than multiplicatively — key to efficient bootstrapping.
Blind Rotation (FHEW/TFHE)
Bootstrap = evaluate f(x) = testVector[x mod N] homomorphically using a polynomial accumulator rotated by encrypted LWE coefficients:
- Initialise ACC = testVector · Xᵇ ∈ RLWE
- For each LWE coefficient aᵢ: ACC ← blind-rotate by aᵢ using RGSW(sᵢ)
- Extract LWE sample from ACC
- Key-switch back to original LWE key
Programmable bootstrapping
The testVector can encode any function f: ℤ_N → ℤ_N — including non-linear activations like ReLU — making TFHE natural for encrypted neural inference.
Bootstrapping — Noise Reset
CKKS Bootstrapping (Han-Ki 2020)
Three homomorphic steps:
- CoeffToSlot — linear map via DFT-like matrix
- EvalMod — sine approximation to wrap coefficients mod q (polynomial approximation of periodic function)
- SlotToCoeff — inverse linear map
Bootstraps n/2 slots in parallel. Throughput: ~10⁶ slot-bootstraps/sec on 1 CPU core.
Bootstrapping cost (CKKS)
| n | Slots | Time (CPU) |
| 2¹⁴ | 8192 | ~2s |
| 2¹⁵ | 16384 | ~6s |
| 2¹⁶ | 32768 | ~25s |
GPU and hardware acceleration reduce these by 100–1000× — see Hardware section.
Amortisation
Bootstrapping all n/2 slots simultaneously means cost-per-slot can be <1μs on optimised hardware, enabling practical encrypted ML inference.
Part III
Key Operations & Operations
Relinearisation, key switching, rotation, RNS representation and NTT — the building blocks of every FHE library
Relinearisation & Key Switching
The degree problem
Multiplying ciphertexts (c₀, c₁) × (c₀', c₁') produces a degree-2 ciphertext (d₀, d₁, d₂) — decryptable only with s² in the secret key. Each further multiply increases degree.
Relinearisation (Relin)
Uses a relinearisation key rlk = Enc(s²) to homomorphically eliminate d₂, returning to degree-1 at the cost of one extra inner product with rlk.
(d₀, d₁, d₂) + d₂ · rlk → (d₀', d₁')
Key Switching
Reencrypts a ciphertext under a different secret key. Uses a key switching key ksk = Enc_{sk_new}(sk_old). Required after:
- Relinearisation (degree reduction)
- Galois rotations (slot rotation)
- Bootstrapping key change
Cost: n large polynomial multiplications — typically the bottleneck.
Galois Rotations
Apply automorphism x→x^(2k+1) to rotate SIMD slots cyclically. Required for inner product, convolution over plaintext slots.
RNS & NTT — The Computational Engine
Residue Number System (RNS)
Large ciphertext modulus q is decomposed as product of small NTT-friendly primes:
q = q₀ · q₁ · … · q_{L-1} (each ~60 bits)
- All arithmetic stays within 64-bit machine words
- Modular reduction is free — no multi-precision arithmetic
- Modulus switching = drop one qᵢ, rescale
- Used by: OpenFHE, SEAL, HElib (CKKS, BGV, BFV)
Number Theoretic Transform (NTT)
Polynomial multiplication in Rq via pointwise product in NTT domain:
a · b = INTT(NTT(a) ⊙ NTT(b))
- O(n log n) vs O(n²) schoolbook
- Each NTT: n · log₂n butterfly operations
- Hardware: same butterfly datapath as post-quantum (ML-KEM, ML-DSA)
- NTT is shared IP across FHE, PQC, ZK-SNARKs
Dominant cost breakdown
Key switching: 70% · NTT ops: 60% of key switching · Bandwidth: often the true bottleneck at scale
Part IV
FHE Libraries & Ecosystem
Open-source FHE libraries, standards efforts (HE-Std), and the compiler toolchain from high-level program to encrypted circuit
FHE Library Landscape
| Library | Schemes | Language | Focus |
| Microsoft SEAL | BFV, CKKS | C++17, Python | Industry-grade, well-documented |
| OpenFHE | BFV, BGV, CKKS, FHEW, TFHE | C++17 | Research + production; successor to PALISADE |
| HElib | BGV, CKKS | C++ | IBM; first practical FHE library |
| HEAAN | CKKS | C++ | Original CKKS reference (Seoul Nat'l Univ) |
| Concrete | TFHE | Rust + Python | Zama; compiler from Python to FHE circuit |
| Lattigo | BFV, BGV, CKKS, RLWE | Go | Multiparty FHE, cloud-native |
HE-Std / HomomorphicEncryption.org
Industry consortium (Microsoft, IBM, Intel, Google, Stanford, MIT…) producing API standards and security parameter recommendations. Security estimates for RLWE at 128/192/256-bit levels — the basis for all library default parameters.
Code: CKKS Encrypted Dot Product
import seal
from seal import EncryptionParameters, scheme_type, SEALContext
from seal import KeyGenerator, Encryptor, Evaluator, Decryptor, CKKSEncoder
# ── Parameters ──────────────────────────────────────────────────────────────
parms = EncryptionParameters(scheme_type.ckks)
parms.set_poly_modulus_degree(8192) # n = 2^13 → 4096 slots
parms.set_coeff_modulus(seal.CoeffModulus.Create(
8192, [60, 40, 40, 60])) # RNS chain: 60+40+40+60 bits
scale = 2**40 # Δ encodes ~12 decimal digits
ctx = SEALContext(parms)
keygen = KeyGenerator(ctx)
pk, sk = keygen.create_public_key(), keygen.secret_key()
relin_ks = keygen.create_relin_keys()
gal_ks = keygen.create_galois_keys() # for rotations
enc = Encryptor(ctx, pk)
evl = Evaluator(ctx)
dec = Decryptor(ctx, sk)
coder = CKKSEncoder(ctx)
# ── Encode and encrypt two real vectors ─────────────────────────────────────
a = [1.5, 2.3, 0.7, 4.1]
b = [0.5, 1.0, 2.0, 0.25]
pt_a = coder.encode(a, scale)
pt_b = coder.encode(b, scale)
ct_a = enc.encrypt(pt_a)
ct_b = enc.encrypt(pt_b)
# ── Compute encrypted dot product via multiply-then-rotate-and-add ──────────
ct_prod = evl.multiply(ct_a, ct_b)
evl.relinearize_inplace(ct_prod, relin_ks)
evl.rescale_to_next_inplace(ct_prod) # scale: Δ² → Δ
# Sum all slots: accumulate with rotations
ct_sum = ct_prod
for i in [1, 2]:
ct_rot = evl.rotate_vector(ct_prod, i, gal_ks)
ct_sum = evl.add(ct_sum, ct_rot)
# ── Decrypt ─────────────────────────────────────────────────────────────────
pt_result = dec.decrypt(ct_sum)
result = coder.decode(pt_result)
print(f"Encrypted dot product ≈ {result[0]:.4f}") # expect ≈ 4.13 (a·b summed)
print(f"Plaintext reference = {sum(x*y for x,y in zip(a,b)):.4f}")
Uses Microsoft SEAL Python bindings. All arithmetic — multiply, relinearise, rescale, rotate — runs entirely on ciphertext. The server never sees a or b.
FHE Compiler Toolchain
The compilation problem
Mapping a high-level program to FHE requires:
- Circuit extraction — replace branches, loops with arithmetic gates
- Scheme selection — CKKS for reals, BFV/TFHE for integers/bits
- Parameter generation — n, q chain, scale, levels sufficient for circuit depth
- Operator scheduling — minimise bootstraps via level optimisation
- Slot packing — SIMD batching across data items
Compiler projects
- HEIR (Google, 2024) — MLIR-based FHE compiler; CKKS/CGGI backends, open source
- Concrete (Zama) — Python to TFHE; transparent parameter selection
- EVA (Microsoft) — CKKS vectoriser; automatic rescaling insertion
- HECO — HElib-targeting, slot rotation optimiser
- Cingulata / Parasol — Boolean circuit compilers for FHEW/TFHE
HEIR architecture
Source → MLIR high-level dialect → secret-type propagation → scheme selection → parameter synthesis → LLVM → CPU/GPU/FPGA
Part V
Hardware Acceleration
GPU, FPGA, and ASIC approaches — NTT units, key-switching datapaths, and the state of FHE silicon
GPU Acceleration
Why GPUs?
- NTT and polynomial multiply are embarrassingly parallel — thousands of independent butterfly units map directly to GPU SIMD lanes
- RNS decomposition: all qᵢ operations are independent → SIMD across RNS limbs
- Ciphertext elements fit in L2/L3 (10–100 MB) — GPU HBM bandwidth avoids DRAM bottleneck
GPU libraries
- cuFHE — CUDA TFHE bootstrapping; <1ms/gate
- 100x — Nvidia CKKS on A100: 150× CPU speedup
- OpenFHE-GPU — CUDA backend (RTX 3090)
- PhantomFHE — CKKS CUDA; 25× SEAL CPU
Benchmarks (CKKS, n=2¹⁵)
| Platform | NTT (ms) | KeySwitch (ms) | Boot (s) |
| CPU (single core) | 12 | 380 | 18 |
| CPU (32-core) | 0.5 | 14 | 0.8 |
| RTX 3090 | 0.08 | 1.2 | 0.15 |
| A100 SXM4 | 0.03 | 0.4 | 0.05 |
Sources: Bossuat et al. (2023), PhantomFHE (2024), estimated from literature
FPGA Acceleration
FPGA advantages for FHE
- Custom bit-widths — 27-bit Barrett modular multipliers for each RNS prime exactly fit DSP slices
- Deterministic latency — no OS jitter, predictable timing for real-time FHE services
- Power efficiency — 5–10× better perf/W vs GPU for NTT workloads
- Reconfigurability — adapt n, q chain, or scheme per workload
Architecture: NTT unit
- Radix-2 DIT butterfly with configurable twiddle factor ROM
- Fully pipelined: one butterfly per cycle at 300–500 MHz
- Multi-bank memory: n/2 parallel butterfly lanes (VU13P: 128 lanes)
Published results
| Platform | Operation | Throughput |
| Xilinx Alveo U250 | NTT n=2¹⁴ | 1.3M NTTs/s |
| Intel Stratix 10 | CKKS HomMul | 35ms |
| Xilinx VCU118 | TFHE boot | 6ms |
| Alveo U280 | BGV HomMul | 8ms |
Key FPGA papers
- HEAX (Microsoft, ISCA 2020) — CKKS on Stratix 10
- CraterLake (MIT, ISCA 2022) — heterogeneous FHE FPGA
- FAB (Berkeley, ISCA 2022) — bootstrapping FPGA
FHE ASIC — Dedicated Silicon
ASIC vs FPGA for FHE
- ASIC: 10–100× better perf/area than FPGA due to custom memory hierarchy optimised for FHE data access patterns
- On-chip SRAM eliminates DRAM bandwidth bottleneck (1–10 TB/s vs 2 TB/s for HBM)
- Custom datapath: 128-bit or 256-bit SIMD over RNS limbs
- Fixed cost: parameter changes require full respin
Published FHE ASICs / chips
- F1 (MIT, MICRO 2021) — first FHE ASIC; CKKS 5400× over CPU; 6nm design point
- BTS (SNU, ISSCC 2022) — CKKS bootstrapping on chip; 22nm; 2 GB SRAM
- CraterLake (MIT, ISCA 2022) — generalised FHE accelerator
- Poseidon (ETH, 2023) — TFHE accelerator; 5nm; <1ms gate-boot
- Intel HERACLES — disclosed 2023; tiled NTT + key-switch engines
Bottleneck analysis — where silicon wins
FHE is fundamentally memory-bandwidth limited. Key switching materialises 20–50 GB of intermediate data per evaluation. ASICs embed SRAM close to NTT compute clusters, slashing off-chip traffic. At 6nm with 500 MB SRAM, F1 achieves >10 TB/s effective internal bandwidth.
Code: NTT Butterfly Unit (SystemVerilog)
// Cooley-Tukey NTT butterfly — negacyclic, Barrett reduction
// Parameters: N=poly_degree, Q=RNS prime (~60-bit), twiddle from ROM
module ntt_butterfly #(
parameter int N = 4096,
parameter int Q = 32'd1073479681, // NTT-friendly prime: Q-1 divisible by 2N
parameter int W = 30 // Barrett k-factor log2
)(
input logic clk, rst_n,
input logic valid_in,
input logic [63:0] a_in, b_in, // two polynomial coefficients
input logic [63:0] w, // twiddle factor W^i mod Q from ROM
output logic valid_out,
output logic [63:0] a_out, b_out
);
// Registered pipeline: 3 stages for multiply-reduce
logic [127:0] wb_full;
logic [63:0] wb_red; // w * b mod Q via Barrett
logic v1, v2;
// Stage 1: multiply
always_ff @(posedge clk) begin
wb_full <= w * b_in; // 64×64 → 128-bit
v1 <= valid_in;
end
// Stage 2: Barrett reduction (128→64-bit mod Q)
barrett_reduce #(.Q(Q), .K(W)) u_bar (
.clk(clk), .in(wb_full), .out(wb_red), .valid_in(v1), .valid_out(v2)
);
// Stage 3: butterfly outputs
always_ff @(posedge clk) begin
a_out <= (a_in + wb_red >= Q) ? a_in + wb_red - Q : a_in + wb_red;
b_out <= (a_in - wb_red + Q >= Q) ? a_in - wb_red : a_in - wb_red + Q;
valid_out <= v2;
end
endmodule
Fully pipelined 3-cycle butterfly. At 400 MHz with N=4096 and L=8 RNS limbs, a complete NTT forward pass takes ≈ 50k cycles = 125μs. Instantiate 64 butterflies per limb for 2μs NTT throughput.
Part VI
Real-World Applications
Private ML inference, genomics, regulated finance, multi-party computation, and emerging use cases
Encrypted Neural Network Inference
Problem setup
A healthcare client wants to run a diagnostic ML model hosted by a cloud provider, without revealing patient data. The cloud should learn nothing about the input or output.
CKKS NN mapping
- Linear layers: encrypted matrix-vector multiply via slot rotations + inner product
- Activation (ReLU): polynomial approximation (Chebyshev) — degree 15–27 typically sufficient for ResNet/VGG
- BatchNorm: absorbed into linear weights (pre-compiled)
- Softmax: approximated; argmax via comparison circuits (expensive)
Published results
| Network | Scheme | Accuracy | Latency |
| LeNet-5 (MNIST) | CKKS | 99.0% | ~2s CPU |
| ResNet-20 (CIFAR) | CKKS | 91.8% | ~6min CPU |
| BERT (NLP, 1 layer) | CKKS+TFHE | 96% | ~90s CPU |
| ResNet-20 (CIFAR) | CKKS+GPU | 91.8% | <30s A100 |
Practical toolkits
- CryptoNets (Microsoft 2016) — first encrypted NN demo
- nGraph-HE (Intel) — TensorFlow + SEAL
- Zama Concrete ML — scikit-learn + TFHE
Genomics, Finance & Private Queries
Private Genomics
- GWAS — genome-wide association study over encrypted SNP data: compute χ² statistics using encrypted integer BGV
- PRS (Polygenic Risk Score) — encrypted inner product of genotype × weights (CKKS, one ciphertext per chromosome)
- iDASH competition — annual benchmark; 2021 winner: PRS on 245k SNPs in 2 min (CKKS+GPU)
- Microsoft SEAL used in Azure Genomics pilot (2021)
Regulated Finance
- Private credit scoring — bank evaluates client's encrypted financial features without learning them (CKKS logistic regression)
- Dark pool matching — encrypted order book matching; buyer/seller reveal nothing until match (TFHE comparisons)
- Risk aggregation — Duality Technologies: FHE-based regulatory reporting across banks
Private database queries
- PIR (Private Information Retrieval) — retrieve DB record without server knowing which row
- CKKS + BFV for encrypted SQL-like aggregates
- Spiral (Washington, 2022) — practical PIR at GB scale
FHE + Multi-Party Computation
Why combine?
FHE gives computation on encrypted data; MPC gives distributed trust — no single party holds the full decryption key. Together: verifiable, distributed encrypted computation.
Threshold FHE
- Split secret key sk among N parties: sk = sk₁ + sk₂ + … + skₙ
- Decrypt only with t-of-n parties cooperating (Shamir sharing)
- Used in: Lattigo multiparty, OpenFHE multiparty extension
- Applications: privacy-preserving ML training across hospitals with no central authority
FHE + ZK Proofs
Combine FHE with Zero-Knowledge Proofs to prove correctness of computation on encrypted data — server proves it executed the agreed function without cheating.
- SNARK-based FHE verifiability (Fiore et al. 2014+)
- Verifiable FHE (vFHE): prover generates succinct proof alongside FHE output
- Emerging: NTT shared between FHE and PLONK/Groth16
Applications
- Private federated learning with verified aggregation
- Auditable encrypted cloud databases
- Private on-chain smart contracts (FHE blockchains)
Part VII
Security & Standardisation
Security models, RLWE hardness assumptions, the HE-Std parameter tables, and FHE in the NIST PQC transition
FHE Security Model
IND-CPA security
All FHE schemes above target IND-CPA (indistinguishability under chosen-plaintext attack). An adversary given Enc(m₀) and Enc(m₁) cannot determine which is which with non-negligible advantage.
Warning: Standard FHE is NOT IND-CCA2 — ciphertext malleability is inherent (homomorphic operations modify ciphertext). Use MACs or signatures to authenticate computation results.
CKKS "leakage" controversy
Li & Micciancio (2021) showed CKKS decryption reveals information beyond the intended plaintext when used in certain interactive protocols — approximate decryption leaks exact-ish values. Mitigations:
- Smudging noise: add large noise before revealing decryption
- Use CKKS only in non-interactive (client decrypts) settings
- Li-Micciancio noise flooding countermeasure in OpenFHE
Quantum security
RLWE hardness is believed quantum-resistant — no known quantum speedup beyond quadratic for lattice problems (Shor's algorithm doesn't apply). Same assumption underlies ML-KEM/ML-DSA.
Standardisation Landscape
HomomorphicEncryption.org (HE-Std)
Community white paper (2018, updated 2021) produced by Microsoft, IBM, Intel, Google, academia. Defines:
- Standardised API for BFV, BGV, CKKS, TFHE
- Concrete security parameter sets at 128/192/256-bit levels
- Security estimator tool (lattice-estimator)
- Reference encoding conventions
ISO/IEC 18033-6 (2019)
International standard for partially homomorphic encryption (Paillier, El Gamal). FHE standardisation is ongoing — NIST has not yet issued FHE-specific FIPS, though NIST IR 8481 (2023) surveys FHE for government use.
FHEW/TFHE in NIST context
FHEW and TFHE build on the same RLWE/LWE assumptions as FIPS 203 (ML-KEM). As PQC migration proceeds, FHE parameter selection will align with NIST SP 800-186 lattice security levels.
lattice-estimator (Albrecht et al.)
Open-source Sage/Python tool for concrete security estimation of RLWE instances. Given (n, q, σ), outputs best-known attack costs (BKZ, uSVP, primal/dual hybrid). Used by all major FHE libraries to validate parameters.
Scheme Selector — Which FHE for Your Task?
Performance: FHE Today vs Tomorrow
| Operation | Scheme | CPU (now) | GPU A100 | ASIC target |
| HomMul (1 level) | CKKS n=2¹⁵ | 380ms | 0.4ms | <0.1ms |
| Bootstrap (4096 slots) | CKKS n=2¹⁴ | 2s | 50ms | <5ms |
| Gate bootstrap | TFHE | 8ms | 0.8ms | <0.1ms |
| ResNet-20 inference | CKKS | 6 min | 30s | <3s |
| PRS (245k SNPs) | CKKS | 2 min | 5s | <1s |
1000× gap vs plaintext
Best-case CKKS HomMul is still ~1000× slower than plaintext float32 multiply. Hardware acceleration narrows this to ~10–50× for many workloads — acceptable for high-value, privacy-critical applications.
Trend: closing fast
- 2009 (Gentry): theoretical only
- 2012: hours per AES block
- 2018: minutes per NN inference
- 2022 (F1 ASIC): milliseconds
- 2026+: real-time encrypted ML
Limitations & Open Problems
Current limitations
- No native comparison — equality/order tests require expensive bit-decomposition or polynomial approximation in CKKS
- No native branching — FHE circuits must be branch-free (TFHE mitigates via bit-level gates)
- Large ciphertext expansion — CKKS expands 1 float to ~50–200 KB ciphertext
- Key material size — evaluation keys (relin + galois) can be tens of GB for large n
- Parameter tuning complexity — non-expert users struggle; compilers help but don't solve all cases
Open research problems
- Efficient bootstrapping — target <1ms per CKKS bootstrap at high precision
- Transciphering — convert AES-encrypted stream to FHE ciphertext homomorphically (reduces client bandwidth)
- FHE-native NNs — train networks with polynomial activations that are FHE-efficient from the start
- Multiparty FHE at scale — distributed decryption without trusted dealer
- Functional encryption — FHE generalisation with restricted decryption per function
FHE Ecosystem (2025)
Commercial
- Zama — Concrete / fhEVM; FHE blockchain, ML
- Duality Technologies — OpenFHE commercialisation; finance/healthcare
- Cornami — FHE streaming ASIC
- Microsoft — SEAL, HEAX, Azure Confidential Computing FHE
- Intel — HERACLES chip, HEXL acceleration library
- IBM — HElib, homomorphic encryption services
Research groups
- MIT CSAIL — F1, CraterLake, Cheetah
- Stanford — Gentry group; FHE theory
- Seoul Nat'l Univ — HEAAN/CKKS; BTS chip
- EPFL — Lattigo; multiparty FHE
- NIST / DHS — NIST IR 8481 FHE evaluation
- ETH Zürich — Poseidon TFHE chip
Standards / consortia
- HomomorphicEncryption.org — parameter standards
- FHE.org — community & annual FHE.org conference
- OpenFHE governance board
Summary & Future Directions
What we covered
- Foundations — LWE, RLWE, noise growth, cyclotomic rings
- Schemes — BGV, BFV, CKKS, TFHE/FHEW: tradeoffs and selection
- Operations — relinearisation, key switching, RNS, NTT
- Hardware — GPU (150×), FPGA, ASICs (F1, BTS, Poseidon)
- Applications — private ML, genomics, finance, MPC+ZK
- Limits — no branches, ciphertext expansion, key size
Watch this space
- FHE-native AI chips — dedicated ASIC for real-time encrypted inference
- FHE blockchains — fully private smart contracts (fhEVM)
- Transciphering at scale — remove AES pre-processing overhead
- NIST FHE standardisation — expected late 2020s
- Quantum + FHE — FHE security assumptions robust to quantum; synergy with QKD?
References
Foundational papers
- Gentry, "A Fully Homomorphic Encryption Scheme," Stanford PhD thesis (2009)
- Regev, "On Lattices, LWE, Random Linear Codes…" STOC (2005)
- Lyubashevsky, Peikert, Regev, "On Ideal Lattices and RLWE" EUROCRYPT (2010)
- Brakerski, Gentry, Vaikuntanathan (BGV), "Fully Homomorphic without Bootstrapping" ITCS (2012)
- Fan, Vercauteren (BFV), "Somewhat Practical Fully Homomorphic Encryption" (2012)
- Cheon, Kim, Kim, Song (CKKS), "Homomorphic Encryption for Arithmetic of Approximate Numbers" ASIACRYPT (2017)
- Ducas, Micciancio (FHEW), "FHEW: Bootstrapping Homomorphic Encryption in Less Than a Second" EUROCRYPT (2015)
- Chillotti et al. (TFHE), "TFHE: Fast Fully Homomorphic Encryption over the Torus" J.Cryptology (2020)
- Gentry, Sahai, Waters (GSW) CRYPTO (2013)
- Li, Micciancio, "On the Security of Homomorphic Encryption on Approximate Numbers" EUROCRYPT (2021)
Hardware & systems
- Reagen et al. (F1), MICRO 2021
- Kim et al. (BTS), ISSCC 2022
- Samardzic et al. (CraterLake), ISCA 2022
- Agrawal et al. (FAB), ISCA 2022
- Boura et al. (HEAX/Microsoft), ISCA 2020
- Viand et al. (Poseidon), ETH 2023
- Han, Ki, "Better Bootstrapping for CKKS" CT-RSA 2020
- HE-Std White Paper, HomomorphicEncryption.org (2021)
- NIST IR 8481 "Getting Ready for Post-Quantum…FHE" (2023)
- Albrecht et al., "Estimate all the {LWE, NTRU} schemes!" SCN (2018) — lattice-estimator
← Back to series landing page