ARM CORTEX-A · PRESENTATION 02

Armv8-A / Armv9-A Architecture

Exception Levels · AArch64 · A64 ISA · System Registers · v8.x → v9.x Evolution
EL0 · EL1 · EL2 · EL3 · AArch64 vs AArch32 · SPSR / ELR / VBAR · ESR · SCR / HCR
02

What is an "Architecture Profile"?

  • The Arm ISA is defined in one massive document — the Arm Architecture Reference Manual (ARM ARM, DDI 0487, ~9000 pages).
  • It splits by profile:
    • A-profile — application; full MMU, four ELs, virtualization.
    • R-profile — real-time; MPU, deterministic.
    • M-profile — microcontroller; no AArch64, NVIC.
  • This deck covers the A-profile: Armv8-A (2011 spec, 2014 silicon) and its "v9" rebrand (2021).

Armv8-A — the clean break

Armv8-A introduced a second execution state, AArch64, with a completely new instruction set (A64), new exception model, new register file. AArch32 was kept for backwards compatibility but is being deprecated piece-by-piece.

v9-A is v8.5-A-plus

Same profile, same execution states — but mandates SVE2, MTE, and adds RME / CCA. Semantically an extension, marketed as a new major.

03

AArch64 vs AArch32 — Two Execution States

AspectAArch64AArch32
Register file31 × 64-bit GPRs (X0–X30) + XZR + SP + PC16 × 32-bit (R0–R12, SP, LR, PC) + banked per mode
Instruction encodingFixed 32-bit (A64)Mixed 16/32-bit (Thumb-2) + 32-bit Arm
Condition codesSelect instructions only (CSEL/CSINC). No universal CCs.Almost every instruction conditional
SIMD / FP32 × 128-bit V0–V31 (NEON + FP unified)32 × 64-bit D0–D31 (VFPv4/NEON overlay)
Exception entryTo fixed EL; ELR_ELx / SPSR_ELxTo privileged mode; banked SP_usr/svc/irq etc.
Memory modelWeak, with explicit LDAR / STLR + LSE atomicsWeak; no LSE (Armv8.1 LSE is AArch64-only)
Status in 2025Universal flagship stateRemoved from X2+, A520+ at EL1-EL3; still allowed at EL0 on some cores

A PE (Processing Element) selects execution state per Exception Level — e.g. EL1 running AArch64 can still have EL0 running AArch32 user-mode apps on cores that allow it.

04

The Four Exception Levels

EL0 — Unprivileged (user) Applications (userspace) EL1 — OS kernel Linux / Windows / XNU kernel EL2 — Hypervisor KVM, Xen, Hyper-V, pKVM (Android) EL3 — Secure Monitor TF-A (Trusted Firmware-A) · SMC handler
  • EL0 — user mode. Can execute only unprivileged instructions. Traps to EL1 on syscall (SVC).
  • EL1 — OS. Full MMU control (TTBR0/1_EL1, SCTLR_EL1). Traps to EL2 via HVC.
  • EL2 — hypervisor. Stage-2 translation. VTTBR_EL2, HCR_EL2. VHE (Armv8.1-A) lets Linux run directly in EL2.
  • EL3 — secure monitor + world switch. Only level that can change SCR_EL3.NS (Non-secure bit). Runs TF-A or equivalent.
  • Higher EL = more privilege. Exception return via ERET uses ELR_ELx / SPSR_ELx.
05

Exception Entry & Return

  • Exceptions take the PE from a lower EL to a higher EL (or same EL). They cannot lower privilege.
  • On entry, hardware writes:
    • ELR_ELx ← preferred return address
    • SPSR_ELx ← saved PSTATE (NZCV + D,A,I,F + EL + nRW)
    • ESR_ELx ← exception syndrome (class + IL + ISS)
    • FAR_ELx ← faulting virtual address (if relevant)
  • PC jumps to VBAR_ELx + offset. Offsets are fixed (0x000, 0x080, 0x100, 0x180 × same-EL / lower-EL × sync/IRQ/FIQ/SError).
  • Return: ERET restores PC from ELR and PSTATE from SPSR.

The 16-entry vector table

VBAR points to a 2 KB-aligned table with 16 × 128-byte vectors — 4 exception types × (current EL with SP0 / current EL with SPx / lower EL AArch64 / lower EL AArch32). Each vector is up to 32 instructions.

ESR — the syndrome register

ESR_ELx.EC (bits 31:26) gives the exception class (26 defined): data abort, instruction abort, SVC, HVC, SMC, illegal state, PAC failure, SVE access, MTE tag check, etc. Linux decodes this to dispatch.

06

A64 — The 64-bit Instruction Set

  • Fixed 32-bit encoding. No Thumb, no mixed length. Every instruction is 4 bytes.
  • 31 × 64-bit general registers X0–X30; access as 32-bit via W0–W30.
  • Register 31 is context-dependent: either the Zero Register (XZR/WZR) or the Stack Pointer (SP).
  • PC is not a general register (lesson learned from Armv7). Only reachable via branch + ADR/ADRP PC-relative generation.
  • Instructions are grouped by top bits — branch, load/store, data-processing-immediate, data-processing-register, SIMD/FP, system.
// Canonical A64 function prologue
stp  x29, x30, [sp, #-16]!   // push FP,LR
mov  x29, sp                 // set FP

// LDAR / STLR — release consistency
ldar  w0, [x1]               // acquire-load
stlr  w2, [x3]               // release-store

// LSE atomics (Armv8.1-A) — ACE scalability
cas   w4, w5, [x6]           // CAS, acquire+release
ldadd w7, w8, [x9]           // atomic add, return old

// PC-relative addressing
adrp  x0, sym                // 4 KB-page PC-rel
add   x0, x0, :lo12:sym      // page offset

// ERET — return from exception
eret                         // ELR_ELx -> PC, SPSR -> PSTATE
07

PSTATE — The Processor State

  • AArch64 replaces CPSR with PSTATE — a logical collection of fields, not a single register.
  • Core fields:
    • NZCV — condition flags (Negative, Zero, Carry, Overflow). Read via MRS x0, NZCV.
    • DAIF — interrupt masks (Debug, SError, IRQ, FIQ).
    • EL[1:0] — current Exception Level.
    • SP — 0 ⇒ SP_EL0, 1 ⇒ SP_ELx.
    • nRW — 0 ⇒ AArch64, 1 ⇒ AArch32.
  • Armv8.5-A adds SSBS (Speculative Store Bypass Safe) and PAN (Privileged Access Never) to PSTATE.

SPSR is snapshot of PSTATE

On exception entry, the full PSTATE is packed into SPSR_ELx bits (N=31, Z=30, C=29, V=28, D=9, A=8, I=7, F=6, M[3:0]=EL+SP). ERET unpacks it back.

PAN — the bug-killer

PSTATE.PAN=1 makes EL1/2 data accesses to EL0-accessible memory fault. Defeats a whole class of kernel bugs where kernel accidentally dereferences user pointer. Armv8.1.

08

Key System Registers

RegisterPurposeRole
SCTLR_ELxSystem ControlMMU enable (M), caches (C, I), WXN, endianness
TTBR0_ELx / TTBR1_ELxTranslation Table BaseLow-half / high-half VA → page-table roots
TCR_ELxTranslation ControlGranule size, VA size, PA size, ASID width
MAIR_ELxMemory Attribute Indirection8 × 8-bit memory-type indices used in PTEs
VBAR_ELxVector Base AddressBase of the exception vector table (2 KB aligned)
ESR_ELxException SyndromeWritten on exception entry (EC + ISS)
ELR_ELx / SPSR_ELxSaved return contextReturn PC + saved PSTATE
SCR_EL3Secure ConfigurationNS bit (world select), IRQ/FIQ route
HCR_EL2Hypervisor ConfigurationVM trap configuration, stage-2 enable
MPIDR_EL1Multi-PE IDAffinity levels Aff0..Aff3 — identifies core in cluster
ID_AA64*_EL1Feature IDDozens of read-only registers — what features are implemented
09

Synchronous vs Asynchronous Exceptions

Synchronous

  • Directly caused by an executing instruction. Precise.
  • SVC (EL0→EL1 syscall), HVC (EL1→EL2), SMC (EL1/2→EL3).
  • Data/instruction aborts (MMU faults, permission violations).
  • Illegal execution state, illegal PSTATE, PAC auth failure.
  • MTE synchronous tag check fault (if configured).

Asynchronous

  • External event, not caused by any particular instruction.
  • IRQ — maskable external interrupt (via GIC).
  • FIQ — higher-priority interrupt (historically "fast"; in AArch64 signalled separately from IRQ).
  • SError — SError (System Error) — catastrophic bus error / RAS event. Routed per HCR_EL2.AMO / SCR_EL3.EA.
  • Masked by PSTATE.DAIF bits.

The vector offset within a 2 KB table encodes both origin (same-EL vs lower-EL) and type (synchronous / IRQ / FIQ / SError) — so the CPU only needs one base register (VBAR_ELx) to dispatch to 16 distinct handlers.

10

Interactive — Explore Each Exception Level

EL0 — User
EL1 — OS
EL2 — Hypervisor
EL3 — Secure Monitor
11

Armv8.x Evolution — Where the Features Landed

VersionYearFeature highlights
Armv8.0-A2011Baseline: AArch64, A64, NEON 128-bit, Crypto (AES/SHA) optional
Armv8.1-A2014LSE atomics, VHE, PAN, RAS, vCPU IDs, AP alternate path
Armv8.2-A2016Optional SVE, FP16, dot-product (UDOT/SDOT), PAN improvements, bfloat16 (later)
Armv8.3-A2017Pointer Authentication (PAC), Nested Virtualization, JSCVT, RCpc loads
Armv8.4-A2018Secure EL2, MPAM, Activity Monitors (AMU), crypto refresh
Armv8.5-A2019MTE, BTI, SSBS, Random Number instructions (RNDR)
Armv8.6-A2020bfloat16 mandatory, Matmul Int8 (i8mm), Enhanced counter virtualization
Armv8.7-A2021WFET / WFIT (timed wait), LD64B / ST64B block ops, PCIe accelerator hints
Armv9.0-A2021= v8.5 + mandatory SVE2 + RME/CCA + TRBE/BRBE
Armv9.2-A2023SME (Scalable Matrix), MPAM v1 refinements
12

AArch64 Calling Convention (AAPCS64)

  • X0–X7 — argument + return registers. Up to 8 integer args passed in registers.
  • X8 — indirect result register (struct return pointer), or syscall number in Linux.
  • X9–X15 — caller-saved temporaries.
  • X16 / X17 — IP0 / IP1 — intra-procedure-call scratch (for PLT thunks).
  • X18 — platform register (TLS on macOS/iOS, reserved on Linux).
  • X19–X28 — callee-saved.
  • X29 (FP) — frame pointer.
  • X30 (LR) — link register.
  • SP — 16-byte aligned at public function entry/exit.
// add_two(int a, int b) — trivial example

add_two:
    add   w0, w0, w1         // w0 = a + b
    ret                      // branch to LR

// function that calls another
caller:
    stp   x29, x30, [sp,#-32]! // push FP, LR
    mov   x29, sp             // set up FP
    mov   w0, #42             // arg0
    mov   w1, #7              // arg1
    bl    add_two             // call
    // result in w0
    ldp   x29, x30, [sp],#32  // pop FP, LR
    ret

Note: stp/ldp pair-ops (load/store pair) are idiomatic in A64 — 2× density vs AArch32's push/pop.

13

WFI / WFE — Low-Power Hints

  • WFI — Wait For Interrupt. Core enters a low-power state until any pending asynchronous exception or debug event.
  • WFE — Wait For Event. Enters low-power state until an event fires: SEV / SEVL, GIC wake-up request, STXR failure (lock contention), timer event.
  • WFE is the cornerstone of spinlocks on AArch64 — spin loops do LDAXR / STXR / CBNZ → WFE so the hardware can park the core until the lock changes.
  • Armv8.7-A added WFIT / WFET — Wait-For-Interrupt/Event with timeout. Avoids the OS having to arm an extra timer to bound sleep.
// Spin-lock using WFE (idiomatic AArch64)
acquire:
  prfm  pstl1keep, [x0]
1:
  ldaxr w1, [x0]      // acquire-load lock word
  cbnz  w1, wait      // if held, WFE
  stxr  w1, w2, [x0]  // try to take (w2=1)
  cbnz  w1, 1b        // retry if stxr failed
  ret
wait:
  wfe                 // park core until SEV
  b     1b

release:
  stlr  wzr, [x0]     // release-store clears it
  sev                 // wake any waiters
  ret
14

A-profile vs M-profile — Why They Look Different

AspectCortex-A (A-profile)Cortex-M (M-profile)
Execution statesAArch64 + AArch32Thumb-only (no AArch64)
Privilege model4 Exception Levels (EL0-EL3)2 (Handler / Thread, × Priv/Unpriv)
Memory managementFull MMU (VMSAv8) — paged, TLBMPU — region-based, no translation
Interrupt controllerGIC (architectural, separate IP)NVIC (part of the CPU itself)
Exception latencyDozens of cycles, OS-mediatedDeterministic, 12 cycles, HW push
SIMDNEON, SVE, SVE2, SMEHelium (MVE) — 128-bit beatwise
VirtualizationEL2 hypervisor + stage-2None
Typical OSLinux, Android, Windows, macOS/iOSZephyr, FreeRTOS, bare-metal

Both are "Arm" but architecturally incompatible — a Cortex-A binary will not run on a Cortex-M and vice versa. Different register models, different ISAs, different system registers, different interrupt models. The companion Cortex-M presentation covers the M side end-to-end.

15

What Changed in Armv9-A

Mandated (was optional in v8)

  • SVE2 — vectors in every v9-A PE. v8 shipped SVE only in Fujitsu A64FX.
  • MTE — Memory Tagging. Must be implementable (allowed to be disabled, but architecture must support it).
  • TRBE — Trace Buffer Extension.

New in v9-A

  • RME — Realm Management Extension. Adds a fourth world (Realm) plus a Root world for EL3.
  • CCA — Confidential Compute Architecture. Enables cloud tenant isolation from hypervisor.
  • SME (v9.2) — Scalable Matrix Extension, streaming-mode tiles.

Four worlds, not two

v8-A had Secure / Non-secure. v9-A with RME adds Realm (for CCA workloads) and Root (EL3 only, above all worlds). The Granule Protection Table (GPT) partitions physical memory by world.

Compat is preserved

Existing Armv8-A software continues to run unchanged on v9-A. All v9 features are discoverable via ID_AA64*_EL1 registers; OSes enable what they find.

16

Lessons

  • "Why four ELs?" → EL0 user, EL1 OS, EL2 hypervisor, EL3 secure monitor. Matches the x86 ring model conceptually but with a clean entry/exit mechanism (ERET) and a secure-world bit that can only be flipped at EL3.
  • "How is an AArch64 syscall different from an AArch32 SVC?" → AArch64 SVC #imm traps synchronously to EL1 at VBAR_EL1 + 0x400 (lower EL AArch64 sync offset). Syscall number is in X8 on Linux, args in X0-X5.
  • "What does VHE do?" → Virtualization Host Extension (Armv8.1) lets Linux run in EL2 directly by re-mapping EL2 register names to alias EL1 — enables KVM without a split host/hypervisor.
  • "Why is A64 fixed 32-bit?" → simpler fetch + decode, enables wider pipelines. Density loss is recovered by denser instructions (LDP/STP pair) and better IPC.
  • "What's NZCV + DAIF?" → PSTATE condition flags + interrupt mask. DAIF = Debug, SError, IRQ, FIQ. Setting DAIFSet, #0xF disables all four. Clearing with DAIFClr.
  • "Where does secure-world code live?" → EL3 (secure monitor / TF-A) and Secure EL1/0 (trusted OS like OP-TEE and trusted apps). EL3 is the only level that can toggle the NS bit.
  • "v9 is v8.5 + what?" → + mandated SVE2, mandated MTE, + RME/CCA, + TRBE.
17

Further Reading & Sibling Decks

In this series

Sibling decks

Primary sources

  • DDI 0487Arm Architecture Reference Manual for A-profile — the canonical 9000-page spec. Search by exception class, register, or instruction.
  • AAPCS64 — Procedure Call Standard for AArch64 — defines calling convention.
  • ARM DEN 0024AProgrammer's Guide for Armv8-A — friendlier intro volume.
  • ARM IHI 0069SBSA (Server Base System Architecture) — where EL assumptions tighten for servers.
  • Learn the architecture — developer.arm.com/documentation — free tutorials keyed to each feature.
18

References

Arm Ltd.Arm Architecture Reference Manual for A-profile (DDI 0487) — definitive spec
Arm Ltd.Learn the architecture series (free on developer.arm.com) — ELs, AArch64 registers, exception model guides
ARM DEN 0024AProgrammer's Guide for Armv8-A — accessible A64 + exception model tutorial
ARM IHI 0069Server Base System Architecture (SBSA) — assumptions Neoverse makes about A-profile
David A. Patterson & John HennessyComputer Organization and Design: ARM Edition (Morgan Kaufmann) — educational intro
Matteo FranchinModern Arm Assembly Language Programming (Apress, 2023) — A64 from scratch
Marc Andreessen Horowitz / A-Ha Moment podcasts — Arm architect oral histories (Richard Grisenthwaite)
Linux kernel — arch/arm64/ — the canonical open-source reader of every A-profile feature

Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.