ARM CORTEX-A · PRESENTATION 05

Security & Virtualization

TrustZone-A · EL2 hypervisor · VHE · PAC · BTI · MTE · RME / CCA

Secure worlds · Stage-2 translation · QARMA · Top-Byte-Ignore · Realms · Granule Protection Table

The Two Halves of this Deck

Isolation primitives

TrustZone-A — Secure vs Non-secure worlds since Armv6 (2004); the original isolation model.
EL2 hypervisor — Armv7-A (2010) Hyp mode, then Armv8-A EL2, then VHE (v8.1).
RME / CCA — Armv9-A (2021) adds Realm & Root worlds; four-world model.

Per-access defences

PAC (Armv8.3) — Pointer Authentication; signs function pointers with a MAC.
BTI (Armv8.5) — Branch Target Identification; defeats JOP.
MTE (Armv8.5) — Memory Tagging; colour-tag pointer & memory, hardware check on each access.
PAN (v8.1), UAO (v8.2), SSBS (v8.5) — incremental kernel hardening.

TrustZone-A — Two Worlds, Four Exception Levels

EL3 is the only level that can toggle SCR_EL3.NS — the bit that switches the PE between Secure and Non-secure worlds.
Banked system registers and banked TTBR means each world has its own MMU configuration, caches tagged by NS.
S-EL2 (Armv8.4-A) added a separate Secure hypervisor for isolating trusted-application workloads.
Main use cases: DRM / Widevine L1, secure boot + attestation, fingerprint / face-ID matchers, payment applets, fTPM.

SMC Calling Convention & TF-A

SMC #imm16 — secure monitor call. Traps from EL1/EL2 (any world) to EL3.
Argument layout (SMCCC): X0 = function ID, X1-X17 = args, X0-X17 = return.
Function ID bit fields: Fast/Yielding, SMC32/SMC64, Owner (Arm / PSCI / SiP / TOS / vendor), Function number.
Canonical handlers:
- PSCI — Power State Coordination: CPU_ON, CPU_OFF, SYSTEM_RESET.
- SDEI — Software Delegated Exception Interface.
- TRNG / HVC — entropy, feature discovery.
Canonical EL3 implementation is Arm Trusted Firmware-A (TF-A) — open-source, BSD-licensed, running on virtually every Cortex-A phone and server board.

// Power on a secondary CPU via PSCI
// Linux kernel function psci_cpu_on()

  ldr   x0, =0xC4000003   // SMCCC: 64-bit, Arm, PSCI CPU_ON
  mov   x1, x_cpuid       // target MPIDR
  ldr   x2, =_text_phys   // entry PA
  mov   x3, #0            // context arg
  smc   #0                // trap to EL3

// EL3 receives, routes to PSCI handler
// which clocks on the core, programs its
// reset vector, and returns success

SMCCC v1.3 adds SVE state handling — Z-regs must be saved/restored across SMC if SVE was in use in the caller.

EL2 Hypervisor — Stage-2 Translation

Guest kernel runs at EL1 believing it has full control. Every guest memory access actually passes through two translation stages:
- Stage 1 (guest): VA → IPA (Intermediate Physical Address).
- Stage 2 (host): IPA → PA — owned by EL2 via VTTBR_EL2.
HCR_EL2 traps: guest's reads/writes to sensitive system registers can be trapped to EL2 for emulation.
VMID tags TLB entries so context-switching between VMs doesn't require flushing.
vGIC — virtual GIC interface. Each VM sees what looks like a GIC; hypervisor context-switches the state.
Timer virtualization — offset registers (CNTVOFF_EL2) give each VM its own view of the counter.

Before Armv8.1: split hypervisor

Host Linux ran at EL1, a small stub called KVM-lowvisor at EL2. Split-mode added overhead on every guest trap. Nobody liked it.

VHE (Armv8.1): host IN EL2

Virtualization Host Extension re-aliases EL2 register names so Linux can run directly in EL2 as if it were EL1. One-level hypervisor. Default Linux ≥ 4.6 on VHE-capable cores.

pKVM — Android's Protected KVM

Android 13+ ships protected KVM on Armv8.2+ devices.
pKVM runs a minimal hypervisor at EL2 that isolates "protected VMs" even from the host Linux kernel.
Each protected VM gets memory that the host OS cannot read or write — enforced by Stage-2 and the Memory Protection Granule.
Use case: DRM playback, health sensors, biometric enclave — all without the size & attack-surface of TrustZone.
Related: Gunyah (Qualcomm's open-source type-1 hypervisor, used on recent Snapdragons).
On Armv9-A, CCA / Realms subsume much of what pKVM does — cleanly architected across vendors.

Type-1 vs type-2 vs hybrid

Type-1: hypervisor runs bare-metal (Xen, Gunyah). Type-2: on top of an OS (VirtualBox). KVM is hybrid — kernel-integrated but still type-1-like when VHE is enabled.

Sigma-style workloads

The "protected" workload pattern (small VM, strong isolation, measured boot) is where CCA Realms + pKVM + S-EL2 converge over the next few Cortex-A generations.

Pointer Authentication (PAC) — Armv8.3-A

A pointer = 64-bit value. Only ~48 bits are real VA; top bits are unused (or TBI tag).
PAC uses those top bits to store a short MAC (12-16 bits) computed from:
- The pointer value itself
- A secret key (APIAKey / APIBKey / APDAKey / APDBKey / APGAKey — stored in system registers)
- A modifier (e.g., stack pointer for return addresses)
Signing: PACIA / PACIB (instruction A/B), PACDA / PACDB (data). Verification: AUTIA / AUTIB.
Compressed canonical pair: RETAA / RETAB — authenticate LR and return in one instruction.
Cipher: QARMA-64 (a round-reduced Feistel network). Lightweight, ~3-5 cycle latency.

What it defeats

ROP (Return-Oriented Programming) — attacker who overwrites a stack return address can't forge a valid MAC → subsequent RETAA faults.

// Function prologue with PAC-RET

caller:
    paciasp                 // sign LR using SP as mod
    stp  x29, x30, [sp,#-16]!
    ...
    ldp  x29, x30, [sp],#16
    retaa                   // auth-then-ret (PACIA key A)

// Attempting to return to attacker-chosen addr
// → AUTIA gets a wrong value, top bits set to
// an invalid canonical tag → translation fault

BTI — Branch Target Identification (Armv8.5-A)

JOP (Jump-Oriented Programming) — attacker chains indirect branches (call/jmp) instead of returns. PAC defeats returns; BTI defeats JOP.
BTI marks legal indirect-branch targets with a new instruction: BTI #{c|j|jc} (a NOP on cores without BTI).
On a BTI-guarded page, an indirect branch to a non-BTI instruction faults with a BTI abort.
Modes: c (call), j (jump), jc (either). Each indirect branch instruction has a type (BLR → c, BR → j).
Compiler sets the page's BTI attribute in PTE GP (Guarded Page) bit; toolchain inserts BTI at function entries and jump-table slots.
Turned on in: Android NDK r21+, modern Fedora, Ubuntu 22.04 arm64 system libraries.

PAC + BTI together

A fully hardened binary uses both: PAC-RET on function returns, BTI on indirect-branch targets. Together they cover both ROP and JOP.

// BTI-guarded callee
func:
    bti  c              // ok for BLR; faults on BR
    paciasp             // sign return addr
    ...

// BTI ensures: attacker with indirect call
// primitive (BLR x0) cannot jump into
// mid-instruction gadgets.

MTE — Memory Tagging Extension (Armv8.5-A)

Each 16-byte memory granule in DRAM has a 4-bit tag stored in a side-band region. Each pointer has a 4-bit tag in bits [59:56] (enabled by TBI).
On each memory access the CPU compares the pointer tag to the granule tag. Mismatch → fault or async report.
Malloc assigns a random 4-bit tag to each allocation; free re-tags the granule. UAF / OOB access has ~15/16 chance of catching the bug.
Modes (TCR_EL1.TCF):
- Async: fault reported later via SError — cheap, production-friendly.
- Sync: fault reported immediately — debug / heavy.
- Asymmetric (Armv8.7-A): sync on stores, async on loads.
First shipping on Pixel 8 (Tensor G3, A78C) in 2023. Extensively used by Chrome and Android runtime for UAF hunting.

// MTE — allocate, use, free

// 1. Random tag for new allocation
irg     x0, x0, xzr          // insert random tag in x0
stg     x0, [x0]              // set granule tag = pointer tag
                              // (repeat for every 16B)

// 2. Load/store — tag checked by hardware
ldr     w1, [x0]              // tag match → ok
ldr     w1, [x0, #16]         // tag match → ok
ldr     w1, [x0, #1024]       // out-of-bounds → FAULT

// 3. Free — retag the storage
addg    x2, x0, #0, #1        // increment pointer tag
stg     x2, [x0]              // retag memory

MTE vs HWASAN vs ASan

Tool	Mechanism	Overhead	Production?
AddressSanitizer (ASan)	Software shadow memory + redzone checks (GCC/Clang instrumentation)	~2× CPU, 3× memory	Debug only
HWASAN	Software tag-check using Top-Byte-Ignore (no MTE HW needed). 8-bit tag compared in instrumentation.	~15-30% CPU, 10% memory	Near-production (Android Fuzz)
MTE (async)	Hardware tag check on every load/store; 4-bit tag	< 5% CPU, < 3% memory	Production (Pixel 8+)
MTE (sync)	Same as async but precise	~10-20% CPU	Debug / diagnosis

HWASAN is "MTE in software" — the compiler emits the tag check before each access. It pioneered the tagging approach and is still used on non-MTE Arm hardware.

MTE is the first memory-safety primitive cheap enough to enable at scale in production. Google reports ~25% of Android 0-days were prevented/detected with MTE enabled in Chromium testing.

RME — Realm Management Extension (Armv9-A)

Four worlds, each with its own view of physical memory:
- Non-secure — host OS + guests (as before)
- Secure — trusted OS (as before)
- Realm — new; isolated confidential-compute guests
- Root — EL3 only; sits above all worlds
Partition enforced by the Granule Protection Table (GPT) — a physical-memory-level firewall checked by every transaction on the interconnect (Granule Protection Check).
New secure monitor component: RMM (Realm Management Monitor) runs at R-EL2 and manages Realms on behalf of the Non-secure hypervisor.
Every AMBA 5 master needs NSE and NS signalling to distinguish the four worlds on the bus.

CCA — Confidential Compute on Arm

CCA is the architectural umbrella: RME + RMM + attestation protocols + SDK.
Goal: confidential cloud workloads on Arm servers, where even the host hypervisor cannot read guest memory.
Mirrors (and interoperates with) Intel TDX and AMD SEV-SNP on x86.
Attestation chain: TF-A boot → RMM → Realm VM image hash → signed by silicon key → verifiable by the tenant before sending secrets.
First silicon: Neoverse V3 / N3 (2024) and Armv9-A flagship phones that choose to enable it (e.g., Dimensity 9300 roadmap). Hyperscalers (AWS Graviton 4, Microsoft Cobalt 100) are prime consumers.

RME vs TrustZone

TrustZone separates Secure / Non-secure at boot time via SoC fuses. RME separates Realms dynamically, per-VM, with cryptographic attestation. TrustZone is for device-vendor secrets; Realms are for cloud-tenant secrets.

Use case spectrum

Cloud AI training / inference on customer data · Healthcare / finance tenants · Federated learning · Secure on-device LLM serving · Protected media playback on phones

Kernel-Hardening PSTATE Flags

Flag	Version	Effect	Defeats
PAN (Privileged Access Never)	v8.1-A	Kernel (EL1) data access to EL0 pages faults	Accidental user-pointer deref in kernel
UAO (User Access Override)	v8.2-A	Explicit unprivileged loads/stores (LDTR/STTR) ignore PAN when needed	Preserves copy_to_user while PAN on
PAN3	v8.7-A	PAN extended to instruction fetches	Execute-user-code-from-kernel bugs
SSBS (Speculative Store Bypass Safe)	v8.5-A	Opt-in per-thread mitigation against Spectre v4	Store → load speculative bypass
DIT (Data Independent Timing)	v8.4-A	Data-oblivious instruction timing	Timing side-channels in crypto
TCO (Tag Check Override)	v8.5-A (MTE)	Temporarily suppress MTE tag-check	Kernel memcpy that can span tags

Linux enables PAN + UAO + SSBS + PAC-RET + BTI by default on modern Armv8.5-A+ kernels. The combination is called "kernel pointer-authentication & hardening" in the arm64 kconfig.

Spectre / Meltdown on Cortex-A

Meltdown (CVE-2017-5754) — Arm cores not generally vulnerable (the architecture specifies speculation doesn't bypass AP faults on most cores). Exception: a small number of older cores had partial exposure.
Spectre v1 (bounds-check bypass) — all speculative cores vulnerable. Mitigation: CSDB / SB Barrier + masking + careful kernel-entry guards.
Spectre v2 (BTB injection) — some early cores affected; mitigated with CSV2, CSV3 features (v8.0-A retrofit) and SB Speculation Barrier (v8.5-A).
Spectre v4 (store bypass) — mitigated by SSBS (v8.5-A) on a per-thread basis.
Straight-line-speculation fixes (SB after indirect branches, DSB + ISB on specific sequences) were added in response to Arm's own security advisories.

What Arm did architecturally

Added explicit speculation barriers (SB, CSDB), per-thread mitigations (SSBS), and visibility flags (CSV2, CSV3) into ID_AA64PFR0_EL1 so Linux can detect what each core does.

The cost

On Cortex-A55 the kernel-entry mitigations cost ~5% syscall throughput. On modern X-class cores with CSV2/CSV3 mandatory, the overhead is near-zero.

Lessons

"TrustZone vs Realm vs Hypervisor — which do I pick?" → TrustZone for device-vendor secrets (fingerprint, DRM). Hypervisor for standard multi-tenant VMs. Realm (CCA) when the tenant doesn't trust the hypervisor — cloud confidential compute.
"How does PAC defeat ROP?" → signs function-pointer values with a key in a system register + context; attacker without that key cannot forge a valid signature; AUTIA fault on mismatch.
"Why does MTE use 4 bits?" → 4 bits fits in the pointer's unused top byte and in a reasonable DRAM side-band (8 GB → 256 MB tag memory at 4 bits / 16 B granule).

"VHE — what changed?" → EL2 register names aliased to behave like EL1. Lets Linux run bare in EL2; removes split-mode KVM overhead. 8.1-A feature.
"SMC vs HVC vs SVC?" → SVC traps EL0→EL1 (syscall). HVC traps EL1→EL2 (hypercall). SMC traps any→EL3 (monitor call / PSCI).
"Why BTI in addition to PAC?" → PAC protects returns (via signed LR); BTI protects forward indirect branches (JOP). Together they cover both halves of ROP/JOP attacks.
"What's the Granule Protection Table?" → physical-memory firewall at 4 KB granule, queried by interconnect on every transaction. Enforces the four-world RME separation.

Related Decks

Cortex-A 02 — Exception model + ELs — prerequisites for this deck
Cortex-M 08 — TrustZone for Armv8-M — contrast: banked state vs banked worlds
AMBA 06 — Future of AMBA, incl. RME / CCA hooks
Neoverse Series — server-side CCA deployment context

Recommended external

Arm Trusted Firmware-A repo — real EL3 code
Google Project Zero — excellent PAC bypass analysis
Linux kernel arch/arm64/kernel/cpu_errata.c — catalog of speculative-exec mitigations per core
Android pKVM docs on source.android.com

References

Arm Ltd. — DDI 0487 — A-profile spec: chapters on TrustZone, EL2, PAC, BTI, MTE, RME
Arm Ltd. — Learn the architecture: Providing protection for complex software — PAC/BTI/MTE walkthrough
Arm Ltd. — Arm Confidential Compute Architecture (CCA) white paper — RME + RMM + attestation
Arm Trusted Firmware-A — trustedfirmware.org — open-source EL3 reference
Serebryany, K. et al. — "Memory Tagging and how it improves C/C++ memory safety" (Google Research)
Qualcomm Security White Papers — Gunyah, Secure Processing Unit architecture
Avanzi, R. — "The QARMA block cipher family" (IACR ToSC 2017) — the PAC cipher
Linux kernel — Documentation/arm64/pointer-authentication.rst, mte.rst, bti.rst
Google Project Zero — "Examining Pointer Authentication on the iPhone XS" (2019)

Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.