Armv8-A introduced a second execution state, AArch64, with a completely new instruction set (A64), new exception model, new register file. AArch32 was kept for backwards compatibility but is being deprecated piece-by-piece.
Same profile, same execution states — but mandates SVE2, MTE, and adds RME / CCA. Semantically an extension, marketed as a new major.
| Aspect | AArch64 | AArch32 |
|---|---|---|
| Register file | 31 × 64-bit GPRs (X0–X30) + XZR + SP + PC | 16 × 32-bit (R0–R12, SP, LR, PC) + banked per mode |
| Instruction encoding | Fixed 32-bit (A64) | Mixed 16/32-bit (Thumb-2) + 32-bit Arm |
| Condition codes | Select instructions only (CSEL/CSINC). No universal CCs. | Almost every instruction conditional |
| SIMD / FP | 32 × 128-bit V0–V31 (NEON + FP unified) | 32 × 64-bit D0–D31 (VFPv4/NEON overlay) |
| Exception entry | To fixed EL; ELR_ELx / SPSR_ELx | To privileged mode; banked SP_usr/svc/irq etc. |
| Memory model | Weak, with explicit LDAR / STLR + LSE atomics | Weak; no LSE (Armv8.1 LSE is AArch64-only) |
| Status in 2025 | Universal flagship state | Removed from X2+, A520+ at EL1-EL3; still allowed at EL0 on some cores |
A PE (Processing Element) selects execution state per Exception Level — e.g. EL1 running AArch64 can still have EL0 running AArch32 user-mode apps on cores that allow it.
VBAR points to a 2 KB-aligned table with 16 × 128-byte vectors — 4 exception types × (current EL with SP0 / current EL with SPx / lower EL AArch64 / lower EL AArch32). Each vector is up to 32 instructions.
ESR_ELx.EC (bits 31:26) gives the exception class (26 defined): data abort, instruction abort, SVC, HVC, SMC, illegal state, PAC failure, SVE access, MTE tag check, etc. Linux decodes this to dispatch.
// Canonical A64 function prologue
stp x29, x30, [sp, #-16]! // push FP,LR
mov x29, sp // set FP
// LDAR / STLR — release consistency
ldar w0, [x1] // acquire-load
stlr w2, [x3] // release-store
// LSE atomics (Armv8.1-A) — ACE scalability
cas w4, w5, [x6] // CAS, acquire+release
ldadd w7, w8, [x9] // atomic add, return old
// PC-relative addressing
adrp x0, sym // 4 KB-page PC-rel
add x0, x0, :lo12:sym // page offset
// ERET — return from exception
eret // ELR_ELx -> PC, SPSR -> PSTATE
On exception entry, the full PSTATE is packed into SPSR_ELx bits (N=31, Z=30, C=29, V=28, D=9, A=8, I=7, F=6, M[3:0]=EL+SP). ERET unpacks it back.
PSTATE.PAN=1 makes EL1/2 data accesses to EL0-accessible memory fault. Defeats a whole class of kernel bugs where kernel accidentally dereferences user pointer. Armv8.1.
| Register | Purpose | Role |
|---|---|---|
| SCTLR_ELx | System Control | MMU enable (M), caches (C, I), WXN, endianness |
| TTBR0_ELx / TTBR1_ELx | Translation Table Base | Low-half / high-half VA → page-table roots |
| TCR_ELx | Translation Control | Granule size, VA size, PA size, ASID width |
| MAIR_ELx | Memory Attribute Indirection | 8 × 8-bit memory-type indices used in PTEs |
| VBAR_ELx | Vector Base Address | Base of the exception vector table (2 KB aligned) |
| ESR_ELx | Exception Syndrome | Written on exception entry (EC + ISS) |
| ELR_ELx / SPSR_ELx | Saved return context | Return PC + saved PSTATE |
| SCR_EL3 | Secure Configuration | NS bit (world select), IRQ/FIQ route |
| HCR_EL2 | Hypervisor Configuration | VM trap configuration, stage-2 enable |
| MPIDR_EL1 | Multi-PE ID | Affinity levels Aff0..Aff3 — identifies core in cluster |
| ID_AA64*_EL1 | Feature ID | Dozens of read-only registers — what features are implemented |
The vector offset within a 2 KB table encodes both origin (same-EL vs lower-EL) and type (synchronous / IRQ / FIQ / SError) — so the CPU only needs one base register (VBAR_ELx) to dispatch to 16 distinct handlers.
| Version | Year | Feature highlights |
|---|---|---|
| Armv8.0-A | 2011 | Baseline: AArch64, A64, NEON 128-bit, Crypto (AES/SHA) optional |
| Armv8.1-A | 2014 | LSE atomics, VHE, PAN, RAS, vCPU IDs, AP alternate path |
| Armv8.2-A | 2016 | Optional SVE, FP16, dot-product (UDOT/SDOT), PAN improvements, bfloat16 (later) |
| Armv8.3-A | 2017 | Pointer Authentication (PAC), Nested Virtualization, JSCVT, RCpc loads |
| Armv8.4-A | 2018 | Secure EL2, MPAM, Activity Monitors (AMU), crypto refresh |
| Armv8.5-A | 2019 | MTE, BTI, SSBS, Random Number instructions (RNDR) |
| Armv8.6-A | 2020 | bfloat16 mandatory, Matmul Int8 (i8mm), Enhanced counter virtualization |
| Armv8.7-A | 2021 | WFET / WFIT (timed wait), LD64B / ST64B block ops, PCIe accelerator hints |
| Armv9.0-A | 2021 | = v8.5 + mandatory SVE2 + RME/CCA + TRBE/BRBE |
| Armv9.2-A | 2023 | SME (Scalable Matrix), MPAM v1 refinements |
// add_two(int a, int b) — trivial example
add_two:
add w0, w0, w1 // w0 = a + b
ret // branch to LR
// function that calls another
caller:
stp x29, x30, [sp,#-32]! // push FP, LR
mov x29, sp // set up FP
mov w0, #42 // arg0
mov w1, #7 // arg1
bl add_two // call
// result in w0
ldp x29, x30, [sp],#32 // pop FP, LR
ret
Note: stp/ldp pair-ops (load/store pair) are idiomatic in A64 — 2× density vs AArch32's push/pop.
// Spin-lock using WFE (idiomatic AArch64)
acquire:
prfm pstl1keep, [x0]
1:
ldaxr w1, [x0] // acquire-load lock word
cbnz w1, wait // if held, WFE
stxr w1, w2, [x0] // try to take (w2=1)
cbnz w1, 1b // retry if stxr failed
ret
wait:
wfe // park core until SEV
b 1b
release:
stlr wzr, [x0] // release-store clears it
sev // wake any waiters
ret
| Aspect | Cortex-A (A-profile) | Cortex-M (M-profile) |
|---|---|---|
| Execution states | AArch64 + AArch32 | Thumb-only (no AArch64) |
| Privilege model | 4 Exception Levels (EL0-EL3) | 2 (Handler / Thread, × Priv/Unpriv) |
| Memory management | Full MMU (VMSAv8) — paged, TLB | MPU — region-based, no translation |
| Interrupt controller | GIC (architectural, separate IP) | NVIC (part of the CPU itself) |
| Exception latency | Dozens of cycles, OS-mediated | Deterministic, 12 cycles, HW push |
| SIMD | NEON, SVE, SVE2, SME | Helium (MVE) — 128-bit beatwise |
| Virtualization | EL2 hypervisor + stage-2 | None |
| Typical OS | Linux, Android, Windows, macOS/iOS | Zephyr, FreeRTOS, bare-metal |
Both are "Arm" but architecturally incompatible — a Cortex-A binary will not run on a Cortex-M and vice versa. Different register models, different ISAs, different system registers, different interrupt models. The companion Cortex-M presentation covers the M side end-to-end.
v8-A had Secure / Non-secure. v9-A with RME adds Realm (for CCA workloads) and Root (EL3 only, above all worlds). The Granule Protection Table (GPT) partitions physical memory by world.
Existing Armv8-A software continues to run unchanged on v9-A. All v9 features are discoverable via ID_AA64*_EL1 registers; OSes enable what they find.
Arm Ltd. — Arm Architecture Reference Manual for A-profile (DDI 0487) — definitive spec
Arm Ltd. — Learn the architecture series (free on developer.arm.com) — ELs, AArch64 registers, exception model guides
ARM DEN 0024A — Programmer's Guide for Armv8-A — accessible A64 + exception model tutorial
ARM IHI 0069 — Server Base System Architecture (SBSA) — assumptions Neoverse makes about A-profile
David A. Patterson & John Hennessy — Computer Organization and Design: ARM Edition (Morgan Kaufmann) — educational intro
Matteo Franchin — Modern Arm Assembly Language Programming (Apress, 2023) — A64 from scratch
Marc Andreessen Horowitz / A-Ha Moment podcasts — Arm architect oral histories (Richard Grisenthwaite)
Linux kernel — arch/arm64/ — the canonical open-source reader of every A-profile feature
Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.