🔒

Modern Cryptography Series · Presentation 08

Fully Homomorphic
Encryption

Compute on ciphertext without decrypting — from Gentry's blueprint through CKKS, TFHE and hardware acceleration to real-world deployment

Lattices BGV · BFV · CKKS TFHE · FHEW Bootstrapping HW Acceleration

Arrow keys or swipe to navigate · Esc for overview · F for fullscreen

← Back to series

The Holy Grail of Cryptography

"Can we delegate computation to an untrusted party without revealing our data?"

🔴 Traditional Cloud

Encrypt data at rest & in transit
Must decrypt to compute
Cloud provider sees plaintext
Data breaches expose secrets

🟢 FHE Cloud

Client encrypts input locally
Server computes on ciphertext
Server never sees plaintext
Client decrypts result only

Enc(x) ⊕ Enc(y) = Enc(x + y) Enc(x) ⊗ Enc(y) = Enc(x · y)

FHE supports arbitrary functions over encrypted data — both addition and multiplication, composable without limit

Homomorphic Encryption Taxonomy

Class	Operations	Depth	Example
PHE — Partially	+ or × (not both)	Unlimited	Paillier, El Gamal
SHE — Somewhat	+ and ×	Limited L	BGV (small circuits)
LHE — Levelled	+ and ×	Fixed L (no boot)	BFV, CKKS, BGV (levelled)
FHE — Fully	+ and ×	Unlimited	BGV+Boot, TFHE, FHEW

Why depth matters

Each multiplication consumes a "level" of noise headroom. Without bootstrapping (refreshing the ciphertext), depth is bounded. FHE = LHE + bootstrapping.

Part I

Mathematical Foundations

Ring Learning With Errors, polynomial rings, and the noise growth that makes FHE both possible and hard

Learning With Errors (LWE)

The LWE Problem

Given many samples (aᵢ, bᵢ) where:

bᵢ = aᵢ · s + eᵢ (mod q)

s — secret vector ∈ ℤₙ
aᵢ — uniformly random ∈ ℤₙₙ
eᵢ — small error from χ (Gaussian)

Finding s is computationally hard — as hard as worst-case lattice problems (Regev 2005).

Why error?

Without noise, solving for s is trivial Gaussian elimination. The error makes the system overdetermined and noisy — no efficient algorithm known.

Security

Reducible from GapSVP and SIVP — hard even for quantum computers.

Ring-LWE — Compact & Efficient

Polynomial Ring

R = ℤ[x] / (xⁿ + 1)

n — power of 2 (512, 1024, 2048, 4096…)
Cyclotomic polynomial xⁿ + 1 gives good structure
Rq = R / qR for ciphertext modulus q

RLWE Sample

(a, b = a·s + e) ∈ Rq × Rq

One polynomial encodes n coefficients — RLWE packs n × more data per sample than plain LWE.

Why xⁿ + 1?

NTT-friendly — enables O(n log n) polynomial multiplication
Negacyclic convolution keeps coefficients bounded
Provably as hard as LWE (Lyubashevsky, Peikert, Regev 2010)

Parameters

n	log₂ q	Security
1024	27	~128-bit
2048	54	~128-bit
4096	109	~128-bit
8192	218	~128-bit

Noise Growth — The Central Challenge

Noise after operations

Addition: noise grows linearly — eᵣ = e₁ + e₂
Multiplication: noise grows multiplicatively — eᵣ ≈ e₁ · e₂ · q

After L multiplications: noise ≈ (initial noise) × qᴸ

Decryption failure

Decryption requires |noise| < q/2. When noise exceeds this bound, decryption produces garbage. This is the hard ceiling on circuit depth for SHE/LHE.

Noise management techniques

Modulus switching — scale down q after each multiplication to reduce noise magnitude proportionally
Key switching — refresh keys after relinearisation (eliminates degree-2 ciphertexts)
Rescaling — CKKS variant: divide by Δ to absorb noise into precision loss
Bootstrapping — homomorphically decrypt to reset noise to fresh level

Part II

FHE Scheme Landscape

BGV, BFV, CKKS, TFHE, FHEW — design rationale, native plaintext space, and target workloads

Gentry's Blueprint (2009)

The Breakthrough

Craig Gentry (Stanford / IBM) proved the first FHE construction. Key insight: self-referential bootstrapping — use the scheme to evaluate its own decryption circuit.

FHE = SHE with depth(Cdec + 1) ≤ L

Squashing trick

Original SHE decryption was too deep. Gentry added a hint in the public key that reduces decryption to a sparse-subset-sum — shallow enough to evaluate homomorphically.

Bootstrapping in one line

Let ct = Enc(m) with high noise
Publish sk encrypted: ct_sk = Enc(sk)
Evaluate Dec_sk(ct) homomorphically using ct_sk
Result: fresh Enc(m) with reset noise

Historical impact

2009: Gentry STOC (IDEAL lattices)
2011: BGV — ring-based, practical
2012: BFV — Brakerski full-RNS
2017: CKKS — approximate arithmetic
2019: TFHE — fast gate-level FHE

BGV & BFV — Integer Arithmetic

BGV (Brakerski-Gentry-Vaikuntanathan 2011)

Plaintext space: ℤₜ[x]/(xⁿ+1) — small integer polynomial ring modulo plaintext prime t
Noise management: modulus switching — drop one modulus level per multiplication
Multi-level ciphertext modulus: q = q₀·q₁·…·qₗ (RNS representation)
Efficient for levelled FHE; bootstrapping optional

BFV (Fan-Vercauteren / Brakerski 2012)

Plaintext space: same — ℤₜ[x]/(xⁿ+1)
Key difference: noise managed via scale invariance — divide by Δ = q/t after multiplication
Simpler modulus structure; single large q
Preferred for: integer lookup tables, exact arithmetic, database queries

BGV vs BFV — practical guidance

BGV is faster when circuit depth L is known at compile time (levelled mode). BFV's simpler structure makes it easier to implement and tune. Both are available in Microsoft SEAL, OpenFHE, and HElib. Choose BGV for deep circuits, BFV for shallow encrypted integer operations.

CKKS — Approximate Real Arithmetic

Cheon-Kim-Kim-Song (2017)

Treats noise as rounding error rather than decryption failure. Enables native floating-point-like operations over encrypted real (and complex) numbers.

Dec(ct) ≈ m (within precision Δ)

Encode real vector m via SIMD slots using CRT/FFT
Scaling factor Δ = 2³⁰ or 2⁶⁰ encodes the fractional precision
Rescaling after × absorbs noise into least-significant precision bits

CKKS SIMD packing

A single ciphertext encodes n/2 complex values simultaneously (Galois automorphism rotations allow inter-slot operations).

n = 8192 → 4096 slots
n = 16384 → 8192 slots
Throughput: millions of multiplications per second on encrypted real data

Best for

Neural network inference on private data
Private genomics (GWAS, PRS)
Encrypted statistics and ML training
Financial risk models

TFHE & FHEW — Fast Gate-Level FHE

FHEW (Ducas-Micciancio 2015)

Encrypts single bits (or small integers) using plain LWE
Bootstrapping takes ~100ms on CPU — fast enough to be practical
Key tool: ring GSW external product for blind rotation
Evaluate any Boolean gate and bootstrap simultaneously

TFHE (Chillotti et al. 2016/2020)

Torus LWE: coefficients in ℝ/ℤ (the torus 𝕋)
Bootstrapping in <10ms on CPU, <1ms on GPU
Any programmable bootstrapping: gate + function evaluation in one step
Used in: Concrete (Zama), OpenFHE

TFHE vs CKKS — choosing the right tool

Property	TFHE	CKKS
Plaintext	Bits / small integers	Reals / complex (SIMD)
Per-gate boot	Yes — automatic	No — amortised over slots
Best for	Control flow, exact logic	ML inference, statistics
Bootstrapping	<10ms / gate	Seconds / level refresh

GSW & the External Product

GSW Encryption (Gentry-Sahai-Waters 2013)

Encrypts message μ as a matrix C ≈ μ · G where G is a gadget matrix. Multiplication with another ciphertext uses the gadget decomposition:

C₁ ⊗ C₂ = C₁ · G⁻¹(C₂)

Noise grows linearly in the decomposition bound rather than multiplicatively — key to efficient bootstrapping.

Blind Rotation (FHEW/TFHE)

Bootstrap = evaluate f(x) = testVector[x mod N] homomorphically using a polynomial accumulator rotated by encrypted LWE coefficients:

Initialise ACC = testVector · Xᵇ ∈ RLWE
For each LWE coefficient aᵢ: ACC ← blind-rotate by aᵢ using RGSW(sᵢ)
Extract LWE sample from ACC
Key-switch back to original LWE key

Programmable bootstrapping

The testVector can encode any function f: ℤ_N → ℤ_N — including non-linear activations like ReLU — making TFHE natural for encrypted neural inference.

Bootstrapping — Noise Reset

CKKS Bootstrapping (Han-Ki 2020)

Three homomorphic steps:

CoeffToSlot — linear map via DFT-like matrix
EvalMod — sine approximation to wrap coefficients mod q (polynomial approximation of periodic function)
SlotToCoeff — inverse linear map

Bootstraps n/2 slots in parallel. Throughput: ~10⁶ slot-bootstraps/sec on 1 CPU core.

Bootstrapping cost (CKKS)

n	Slots	Time (CPU)
2¹⁴	8192	~2s
2¹⁵	16384	~6s
2¹⁶	32768	~25s

GPU and hardware acceleration reduce these by 100–1000× — see Hardware section.

Amortisation

Bootstrapping all n/2 slots simultaneously means cost-per-slot can be <1μs on optimised hardware, enabling practical encrypted ML inference.

Part III

Key Operations & Operations

Relinearisation, key switching, rotation, RNS representation and NTT — the building blocks of every FHE library

Relinearisation & Key Switching

The degree problem

Multiplying ciphertexts (c₀, c₁) × (c₀', c₁') produces a degree-2 ciphertext (d₀, d₁, d₂) — decryptable only with s² in the secret key. Each further multiply increases degree.

Relinearisation (Relin)

Uses a relinearisation key rlk = Enc(s²) to homomorphically eliminate d₂, returning to degree-1 at the cost of one extra inner product with rlk.

(d₀, d₁, d₂) + d₂ · rlk → (d₀', d₁')

Key Switching

Reencrypts a ciphertext under a different secret key. Uses a key switching key ksk = Enc_{sk_new}(sk_old). Required after:

Relinearisation (degree reduction)
Galois rotations (slot rotation)
Bootstrapping key change

Cost: n large polynomial multiplications — typically the bottleneck.

Galois Rotations

Apply automorphism x→x^(2k+1) to rotate SIMD slots cyclically. Required for inner product, convolution over plaintext slots.

RNS & NTT — The Computational Engine

Residue Number System (RNS)

Large ciphertext modulus q is decomposed as product of small NTT-friendly primes:

q = q₀ · q₁ · … · q_{L-1} (each ~60 bits)

All arithmetic stays within 64-bit machine words
Modular reduction is free — no multi-precision arithmetic
Modulus switching = drop one qᵢ, rescale
Used by: OpenFHE, SEAL, HElib (CKKS, BGV, BFV)

Number Theoretic Transform (NTT)

Polynomial multiplication in Rq via pointwise product in NTT domain:

a · b = INTT(NTT(a) ⊙ NTT(b))

O(n log n) vs O(n²) schoolbook
Each NTT: n · log₂n butterfly operations
Hardware: same butterfly datapath as post-quantum (ML-KEM, ML-DSA)
NTT is shared IP across FHE, PQC, ZK-SNARKs

Dominant cost breakdown

Key switching: 70% · NTT ops: 60% of key switching · Bandwidth: often the true bottleneck at scale

Part IV

FHE Libraries & Ecosystem

Open-source FHE libraries, standards efforts (HE-Std), and the compiler toolchain from high-level program to encrypted circuit

FHE Library Landscape

Library	Schemes	Language	Focus
Microsoft SEAL	BFV, CKKS	C++17, Python	Industry-grade, well-documented
OpenFHE	BFV, BGV, CKKS, FHEW, TFHE	C++17	Research + production; successor to PALISADE
HElib	BGV, CKKS	C++	IBM; first practical FHE library
HEAAN	CKKS	C++	Original CKKS reference (Seoul Nat'l Univ)
Concrete	TFHE	Rust + Python	Zama; compiler from Python to FHE circuit
Lattigo	BFV, BGV, CKKS, RLWE	Go	Multiparty FHE, cloud-native

HE-Std / HomomorphicEncryption.org

Industry consortium (Microsoft, IBM, Intel, Google, Stanford, MIT…) producing API standards and security parameter recommendations. Security estimates for RLWE at 128/192/256-bit levels — the basis for all library default parameters.

Code: CKKS Encrypted Dot Product

import seal
from seal import EncryptionParameters, scheme_type, SEALContext
from seal import KeyGenerator, Encryptor, Evaluator, Decryptor, CKKSEncoder

# ── Parameters ──────────────────────────────────────────────────────────────
parms = EncryptionParameters(scheme_type.ckks)
parms.set_poly_modulus_degree(8192)                  # n = 2^13 → 4096 slots
parms.set_coeff_modulus(seal.CoeffModulus.Create(
    8192, [60, 40, 40, 60]))                         # RNS chain: 60+40+40+60 bits
scale = 2**40                                        # Δ encodes ~12 decimal digits

ctx   = SEALContext(parms)
keygen   = KeyGenerator(ctx)
pk, sk   = keygen.create_public_key(), keygen.secret_key()
relin_ks = keygen.create_relin_keys()
gal_ks   = keygen.create_galois_keys()              # for rotations

enc  = Encryptor(ctx, pk)
evl  = Evaluator(ctx)
dec  = Decryptor(ctx, sk)
coder = CKKSEncoder(ctx)

# ── Encode and encrypt two real vectors ─────────────────────────────────────
a = [1.5, 2.3, 0.7, 4.1]
b = [0.5, 1.0, 2.0, 0.25]

pt_a = coder.encode(a, scale)
pt_b = coder.encode(b, scale)

ct_a = enc.encrypt(pt_a)
ct_b = enc.encrypt(pt_b)

# ── Compute encrypted dot product via multiply-then-rotate-and-add ──────────
ct_prod = evl.multiply(ct_a, ct_b)
evl.relinearize_inplace(ct_prod, relin_ks)
evl.rescale_to_next_inplace(ct_prod)               # scale: Δ² → Δ

# Sum all slots: accumulate with rotations
ct_sum = ct_prod
for i in [1, 2]:
    ct_rot = evl.rotate_vector(ct_prod, i, gal_ks)
    ct_sum = evl.add(ct_sum, ct_rot)

# ── Decrypt ─────────────────────────────────────────────────────────────────
pt_result = dec.decrypt(ct_sum)
result = coder.decode(pt_result)

print(f"Encrypted dot product ≈ {result[0]:.4f}")  # expect ≈ 4.13 (a·b summed)
print(f"Plaintext reference   = {sum(x*y for x,y in zip(a,b)):.4f}")

Uses Microsoft SEAL Python bindings. All arithmetic — multiply, relinearise, rescale, rotate — runs entirely on ciphertext. The server never sees a or b.

FHE Compiler Toolchain

The compilation problem

Mapping a high-level program to FHE requires:

Circuit extraction — replace branches, loops with arithmetic gates
Scheme selection — CKKS for reals, BFV/TFHE for integers/bits
Parameter generation — n, q chain, scale, levels sufficient for circuit depth
Operator scheduling — minimise bootstraps via level optimisation
Slot packing — SIMD batching across data items

Compiler projects

HEIR (Google, 2024) — MLIR-based FHE compiler; CKKS/CGGI backends, open source
Concrete (Zama) — Python to TFHE; transparent parameter selection
EVA (Microsoft) — CKKS vectoriser; automatic rescaling insertion
HECO — HElib-targeting, slot rotation optimiser
Cingulata / Parasol — Boolean circuit compilers for FHEW/TFHE

HEIR architecture

Source → MLIR high-level dialect → secret-type propagation → scheme selection → parameter synthesis → LLVM → CPU/GPU/FPGA

Part V

Hardware Acceleration

GPU, FPGA, and ASIC approaches — NTT units, key-switching datapaths, and the state of FHE silicon

GPU Acceleration

Why GPUs?

NTT and polynomial multiply are embarrassingly parallel — thousands of independent butterfly units map directly to GPU SIMD lanes
RNS decomposition: all qᵢ operations are independent → SIMD across RNS limbs
Ciphertext elements fit in L2/L3 (10–100 MB) — GPU HBM bandwidth avoids DRAM bottleneck

GPU libraries

cuFHE — CUDA TFHE bootstrapping; <1ms/gate
100x — Nvidia CKKS on A100: 150× CPU speedup
OpenFHE-GPU — CUDA backend (RTX 3090)
PhantomFHE — CKKS CUDA; 25× SEAL CPU

Benchmarks (CKKS, n=2¹⁵)

Platform	NTT (ms)	KeySwitch (ms)	Boot (s)
CPU (single core)	12	380	18
CPU (32-core)	0.5	14	0.8
RTX 3090	0.08	1.2	0.15
A100 SXM4	0.03	0.4	0.05

Sources: Bossuat et al. (2023), PhantomFHE (2024), estimated from literature

FPGA Acceleration

FPGA advantages for FHE

Custom bit-widths — 27-bit Barrett modular multipliers for each RNS prime exactly fit DSP slices
Deterministic latency — no OS jitter, predictable timing for real-time FHE services
Power efficiency — 5–10× better perf/W vs GPU for NTT workloads
Reconfigurability — adapt n, q chain, or scheme per workload

Architecture: NTT unit

Radix-2 DIT butterfly with configurable twiddle factor ROM
Fully pipelined: one butterfly per cycle at 300–500 MHz
Multi-bank memory: n/2 parallel butterfly lanes (VU13P: 128 lanes)

Published results

Platform	Operation	Throughput
Xilinx Alveo U250	NTT n=2¹⁴	1.3M NTTs/s
Intel Stratix 10	CKKS HomMul	35ms
Xilinx VCU118	TFHE boot	6ms
Alveo U280	BGV HomMul	8ms

Key FPGA papers

HEAX (Microsoft, ISCA 2020) — CKKS on Stratix 10
CraterLake (MIT, ISCA 2022) — heterogeneous FHE FPGA
FAB (Berkeley, ISCA 2022) — bootstrapping FPGA

FHE ASIC — Dedicated Silicon

ASIC vs FPGA for FHE

ASIC: 10–100× better perf/area than FPGA due to custom memory hierarchy optimised for FHE data access patterns
On-chip SRAM eliminates DRAM bandwidth bottleneck (1–10 TB/s vs 2 TB/s for HBM)
Custom datapath: 128-bit or 256-bit SIMD over RNS limbs
Fixed cost: parameter changes require full respin

Published FHE ASICs / chips

F1 (MIT, MICRO 2021) — first FHE ASIC; CKKS 5400× over CPU; 6nm design point
BTS (SNU, ISSCC 2022) — CKKS bootstrapping on chip; 22nm; 2 GB SRAM
CraterLake (MIT, ISCA 2022) — generalised FHE accelerator
Poseidon (ETH, 2023) — TFHE accelerator; 5nm; <1ms gate-boot
Intel HERACLES — disclosed 2023; tiled NTT + key-switch engines

Bottleneck analysis — where silicon wins

FHE is fundamentally memory-bandwidth limited. Key switching materialises 20–50 GB of intermediate data per evaluation. ASICs embed SRAM close to NTT compute clusters, slashing off-chip traffic. At 6nm with 500 MB SRAM, F1 achieves >10 TB/s effective internal bandwidth.

Code: NTT Butterfly Unit (SystemVerilog)

// Cooley-Tukey NTT butterfly — negacyclic, Barrett reduction
// Parameters: N=poly_degree, Q=RNS prime (~60-bit), twiddle from ROM
module ntt_butterfly #(
  parameter int N     = 4096,
  parameter int Q     = 32'd1073479681,   // NTT-friendly prime: Q-1 divisible by 2N
  parameter int W     = 30               // Barrett k-factor log2
)(
  input  logic        clk, rst_n,
  input  logic        valid_in,
  input  logic [63:0] a_in, b_in,        // two polynomial coefficients
  input  logic [63:0] w,                 // twiddle factor W^i mod Q from ROM
  output logic        valid_out,
  output logic [63:0] a_out, b_out
);
  // Registered pipeline: 3 stages for multiply-reduce
  logic [127:0] wb_full;
  logic [63:0]  wb_red;    // w * b mod Q via Barrett
  logic         v1, v2;

  // Stage 1: multiply
  always_ff @(posedge clk) begin
    wb_full <= w * b_in;              // 64×64 → 128-bit
    v1      <= valid_in;
  end

  // Stage 2: Barrett reduction (128→64-bit mod Q)
  barrett_reduce #(.Q(Q), .K(W)) u_bar (
    .clk(clk), .in(wb_full), .out(wb_red), .valid_in(v1), .valid_out(v2)
  );

  // Stage 3: butterfly outputs
  always_ff @(posedge clk) begin
    a_out     <= (a_in + wb_red >= Q) ? a_in + wb_red - Q : a_in + wb_red;
    b_out     <= (a_in - wb_red + Q >= Q) ? a_in - wb_red : a_in - wb_red + Q;
    valid_out <= v2;
  end
endmodule

Fully pipelined 3-cycle butterfly. At 400 MHz with N=4096 and L=8 RNS limbs, a complete NTT forward pass takes ≈ 50k cycles = 125μs. Instantiate 64 butterflies per limb for 2μs NTT throughput.

Part VI

Real-World Applications

Private ML inference, genomics, regulated finance, multi-party computation, and emerging use cases

Encrypted Neural Network Inference

Problem setup

A healthcare client wants to run a diagnostic ML model hosted by a cloud provider, without revealing patient data. The cloud should learn nothing about the input or output.

CKKS NN mapping

Linear layers: encrypted matrix-vector multiply via slot rotations + inner product
Activation (ReLU): polynomial approximation (Chebyshev) — degree 15–27 typically sufficient for ResNet/VGG
BatchNorm: absorbed into linear weights (pre-compiled)
Softmax: approximated; argmax via comparison circuits (expensive)

Published results

Network	Scheme	Accuracy	Latency
LeNet-5 (MNIST)	CKKS	99.0%	~2s CPU
ResNet-20 (CIFAR)	CKKS	91.8%	~6min CPU
BERT (NLP, 1 layer)	CKKS+TFHE	96%	~90s CPU
ResNet-20 (CIFAR)	CKKS+GPU	91.8%	<30s A100

Practical toolkits

CryptoNets (Microsoft 2016) — first encrypted NN demo
nGraph-HE (Intel) — TensorFlow + SEAL
Zama Concrete ML — scikit-learn + TFHE

Genomics, Finance & Private Queries

Private Genomics

GWAS — genome-wide association study over encrypted SNP data: compute χ² statistics using encrypted integer BGV
PRS (Polygenic Risk Score) — encrypted inner product of genotype × weights (CKKS, one ciphertext per chromosome)
iDASH competition — annual benchmark; 2021 winner: PRS on 245k SNPs in 2 min (CKKS+GPU)
Microsoft SEAL used in Azure Genomics pilot (2021)

Regulated Finance

Private credit scoring — bank evaluates client's encrypted financial features without learning them (CKKS logistic regression)
Dark pool matching — encrypted order book matching; buyer/seller reveal nothing until match (TFHE comparisons)
Risk aggregation — Duality Technologies: FHE-based regulatory reporting across banks

Private database queries

PIR (Private Information Retrieval) — retrieve DB record without server knowing which row
CKKS + BFV for encrypted SQL-like aggregates
Spiral (Washington, 2022) — practical PIR at GB scale

FHE + Multi-Party Computation

Why combine?

FHE gives computation on encrypted data; MPC gives distributed trust — no single party holds the full decryption key. Together: verifiable, distributed encrypted computation.

Threshold FHE

Split secret key sk among N parties: sk = sk₁ + sk₂ + … + skₙ
Decrypt only with t-of-n parties cooperating (Shamir sharing)
Used in: Lattigo multiparty, OpenFHE multiparty extension
Applications: privacy-preserving ML training across hospitals with no central authority

FHE + ZK Proofs

Combine FHE with Zero-Knowledge Proofs to prove correctness of computation on encrypted data — server proves it executed the agreed function without cheating.

SNARK-based FHE verifiability (Fiore et al. 2014+)
Verifiable FHE (vFHE): prover generates succinct proof alongside FHE output
Emerging: NTT shared between FHE and PLONK/Groth16

Applications

Private federated learning with verified aggregation
Auditable encrypted cloud databases
Private on-chain smart contracts (FHE blockchains)

Part VII

Security & Standardisation

Security models, RLWE hardness assumptions, the HE-Std parameter tables, and FHE in the NIST PQC transition

FHE Security Model

IND-CPA security

All FHE schemes above target IND-CPA (indistinguishability under chosen-plaintext attack). An adversary given Enc(m₀) and Enc(m₁) cannot determine which is which with non-negligible advantage.

Warning: Standard FHE is NOT IND-CCA2 — ciphertext malleability is inherent (homomorphic operations modify ciphertext). Use MACs or signatures to authenticate computation results.

CKKS "leakage" controversy

Li & Micciancio (2021) showed CKKS decryption reveals information beyond the intended plaintext when used in certain interactive protocols — approximate decryption leaks exact-ish values. Mitigations:

Smudging noise: add large noise before revealing decryption
Use CKKS only in non-interactive (client decrypts) settings
Li-Micciancio noise flooding countermeasure in OpenFHE

Quantum security

RLWE hardness is believed quantum-resistant — no known quantum speedup beyond quadratic for lattice problems (Shor's algorithm doesn't apply). Same assumption underlies ML-KEM/ML-DSA.

Standardisation Landscape

HomomorphicEncryption.org (HE-Std)

Community white paper (2018, updated 2021) produced by Microsoft, IBM, Intel, Google, academia. Defines:

Standardised API for BFV, BGV, CKKS, TFHE
Concrete security parameter sets at 128/192/256-bit levels
Security estimator tool (lattice-estimator)
Reference encoding conventions

ISO/IEC 18033-6 (2019)

International standard for partially homomorphic encryption (Paillier, El Gamal). FHE standardisation is ongoing — NIST has not yet issued FHE-specific FIPS, though NIST IR 8481 (2023) surveys FHE for government use.

FHEW/TFHE in NIST context

FHEW and TFHE build on the same RLWE/LWE assumptions as FIPS 203 (ML-KEM). As PQC migration proceeds, FHE parameter selection will align with NIST SP 800-186 lattice security levels.

lattice-estimator (Albrecht et al.)

Open-source Sage/Python tool for concrete security estimation of RLWE instances. Given (n, q, σ), outputs best-known attack costs (BKZ, uSVP, primal/dual hybrid). Used by all major FHE libraries to validate parameters.

Scheme Selector — Which FHE for Your Task?

Performance: FHE Today vs Tomorrow

Operation	Scheme	CPU (now)	GPU A100	ASIC target
HomMul (1 level)	CKKS n=2¹⁵	380ms	0.4ms	<0.1ms
Bootstrap (4096 slots)	CKKS n=2¹⁴	2s	50ms	<5ms
Gate bootstrap	TFHE	8ms	0.8ms	<0.1ms
ResNet-20 inference	CKKS	6 min	30s	<3s
PRS (245k SNPs)	CKKS	2 min	5s	<1s

1000× gap vs plaintext

Best-case CKKS HomMul is still ~1000× slower than plaintext float32 multiply. Hardware acceleration narrows this to ~10–50× for many workloads — acceptable for high-value, privacy-critical applications.

Trend: closing fast

2009 (Gentry): theoretical only
2012: hours per AES block
2018: minutes per NN inference
2022 (F1 ASIC): milliseconds
2026+: real-time encrypted ML

Limitations & Open Problems

Current limitations

No native comparison — equality/order tests require expensive bit-decomposition or polynomial approximation in CKKS
No native branching — FHE circuits must be branch-free (TFHE mitigates via bit-level gates)
Large ciphertext expansion — CKKS expands 1 float to ~50–200 KB ciphertext
Key material size — evaluation keys (relin + galois) can be tens of GB for large n
Parameter tuning complexity — non-expert users struggle; compilers help but don't solve all cases

Open research problems

Efficient bootstrapping — target <1ms per CKKS bootstrap at high precision
Transciphering — convert AES-encrypted stream to FHE ciphertext homomorphically (reduces client bandwidth)
FHE-native NNs — train networks with polynomial activations that are FHE-efficient from the start
Multiparty FHE at scale — distributed decryption without trusted dealer
Functional encryption — FHE generalisation with restricted decryption per function

FHE Ecosystem (2025)

Commercial

Zama — Concrete / fhEVM; FHE blockchain, ML
Duality Technologies — OpenFHE commercialisation; finance/healthcare
Cornami — FHE streaming ASIC
Microsoft — SEAL, HEAX, Azure Confidential Computing FHE
Intel — HERACLES chip, HEXL acceleration library
IBM — HElib, homomorphic encryption services

Research groups

MIT CSAIL — F1, CraterLake, Cheetah
Stanford — Gentry group; FHE theory
Seoul Nat'l Univ — HEAAN/CKKS; BTS chip
EPFL — Lattigo; multiparty FHE
NIST / DHS — NIST IR 8481 FHE evaluation
ETH Zürich — Poseidon TFHE chip

Standards / consortia

HomomorphicEncryption.org — parameter standards
FHE.org — community & annual FHE.org conference
OpenFHE governance board

Summary & Future Directions

What we covered

Foundations — LWE, RLWE, noise growth, cyclotomic rings
Schemes — BGV, BFV, CKKS, TFHE/FHEW: tradeoffs and selection
Operations — relinearisation, key switching, RNS, NTT
Hardware — GPU (150×), FPGA, ASICs (F1, BTS, Poseidon)
Applications — private ML, genomics, finance, MPC+ZK
Limits — no branches, ciphertext expansion, key size

Watch this space

FHE-native AI chips — dedicated ASIC for real-time encrypted inference
FHE blockchains — fully private smart contracts (fhEVM)
Transciphering at scale — remove AES pre-processing overhead
NIST FHE standardisation — expected late 2020s
Quantum + FHE — FHE security assumptions robust to quantum; synergy with QKD?

Series connection

FHE builds on: 07 PQC (RLWE), 01 Finite fields (NTT), 10 Hardware (NTT ASICs), 09 Side-channel (power attacks on FHE chips)

References

Foundational papers

Gentry, "A Fully Homomorphic Encryption Scheme," Stanford PhD thesis (2009)
Regev, "On Lattices, LWE, Random Linear Codes…" STOC (2005)
Lyubashevsky, Peikert, Regev, "On Ideal Lattices and RLWE" EUROCRYPT (2010)
Brakerski, Gentry, Vaikuntanathan (BGV), "Fully Homomorphic without Bootstrapping" ITCS (2012)
Fan, Vercauteren (BFV), "Somewhat Practical Fully Homomorphic Encryption" (2012)
Cheon, Kim, Kim, Song (CKKS), "Homomorphic Encryption for Arithmetic of Approximate Numbers" ASIACRYPT (2017)
Ducas, Micciancio (FHEW), "FHEW: Bootstrapping Homomorphic Encryption in Less Than a Second" EUROCRYPT (2015)
Chillotti et al. (TFHE), "TFHE: Fast Fully Homomorphic Encryption over the Torus" J.Cryptology (2020)
Gentry, Sahai, Waters (GSW) CRYPTO (2013)
Li, Micciancio, "On the Security of Homomorphic Encryption on Approximate Numbers" EUROCRYPT (2021)

Hardware & systems

Reagen et al. (F1), MICRO 2021
Kim et al. (BTS), ISSCC 2022
Samardzic et al. (CraterLake), ISCA 2022
Agrawal et al. (FAB), ISCA 2022
Boura et al. (HEAX/Microsoft), ISCA 2020
Viand et al. (Poseidon), ETH 2023
Han, Ki, "Better Bootstrapping for CKKS" CT-RSA 2020
HE-Std White Paper, HomomorphicEncryption.org (2021)
NIST IR 8481 "Getting Ready for Post-Quantum…FHE" (2023)
Albrecht et al., "Estimate all the {LWE, NTRU} schemes!" SCN (2018) — lattice-estimator

← Back to series landing page

Fully HomomorphicEncryption

The Holy Grail of Cryptography

🔴 Traditional Cloud

🟢 FHE Cloud

Homomorphic Encryption Taxonomy

Why depth matters

Mathematical Foundations

Learning With Errors (LWE)

The LWE Problem

Why error?

Security

Ring-LWE — Compact & Efficient

Polynomial Ring

RLWE Sample

Why xⁿ + 1?

Parameters

Noise Growth — The Central Challenge

Noise after operations

Decryption failure

Noise management techniques

FHE Scheme Landscape

Gentry's Blueprint (2009)

The Breakthrough

Squashing trick

Bootstrapping in one line

Historical impact

BGV & BFV — Integer Arithmetic

BGV (Brakerski-Gentry-Vaikuntanathan 2011)

BFV (Fan-Vercauteren / Brakerski 2012)

BGV vs BFV — practical guidance

CKKS — Approximate Real Arithmetic

Cheon-Kim-Kim-Song (2017)

CKKS SIMD packing

Best for

TFHE & FHEW — Fast Gate-Level FHE

FHEW (Ducas-Micciancio 2015)

TFHE (Chillotti et al. 2016/2020)

TFHE vs CKKS — choosing the right tool

GSW & the External Product

GSW Encryption (Gentry-Sahai-Waters 2013)

Blind Rotation (FHEW/TFHE)

Programmable bootstrapping

Bootstrapping — Noise Reset

CKKS Bootstrapping (Han-Ki 2020)

Bootstrapping cost (CKKS)

Amortisation

Key Operations & Operations

Relinearisation & Key Switching

The degree problem

Relinearisation (Relin)

Key Switching

Galois Rotations

RNS & NTT — The Computational Engine

Residue Number System (RNS)

Number Theoretic Transform (NTT)

Dominant cost breakdown

FHE Libraries & Ecosystem

FHE Library Landscape

HE-Std / HomomorphicEncryption.org

Code: CKKS Encrypted Dot Product

FHE Compiler Toolchain

The compilation problem

Compiler projects

HEIR architecture

Hardware Acceleration

GPU Acceleration

Why GPUs?

GPU libraries

Benchmarks (CKKS, n=2¹⁵)

FPGA Acceleration

FPGA advantages for FHE

Architecture: NTT unit

Published results

Key FPGA papers

FHE ASIC — Dedicated Silicon

ASIC vs FPGA for FHE

Published FHE ASICs / chips

Bottleneck analysis — where silicon wins

Code: NTT Butterfly Unit (SystemVerilog)

Real-World Applications

Fully Homomorphic
Encryption