ARM CORTEX-A · PRESENTATION 01

History & the Cortex-A Family

From ARM11 to Cortex-X925 · Two decades of the Application Profile
ARM11 · Cortex-A8 · A9 · A15 · A53 · A72-A78 · Cortex-X · DynamIQ · Armv9-A
Navigate: → ←  |  Overview: Esc  |  Fullscreen: F
02

Why the A-Profile Matters

  • The A-profile is the half of Arm that ships in every smartphone, tablet, Chromebook, and increasingly laptop & server — billions of cores per year.
  • It is also where Arm does its hardest engineering: out-of-order execution, SVE, full MMUs, big.LITTLE / DynamIQ, hypervisor support.
  • The 2004 Cortex profile split created three families — M (microcontroller), R (real-time), A (application). A is the one Arm's revenue depends on.
  • Two decades on: ARM11 → Cortex-X925 is the story of mobile computing itself — and the template Neoverse later reused for servers.
"When we split the Cortex profiles in 2004, we weren't sure the A-profile would outgrow the classic ARM9/11 lines. Ten years later it was everywhere." — Mike Muller, Arm CTO (2004-2017), paraphrased

Understanding A-profile history is how you answer "why do modern Arm SoCs look the way they do?" — every design choice, from ASID-tagged TLBs to SVE, traces to a specific mobile-era pain point.

03

Pre-Cortex: the ARM11 Classic Cores (2002-2008)

  • ARM1136JF-S — the first ARMv6 core. 8-stage pipeline, Jazelle bytecode, VFPv2, first Arm with full MMU + caches + TrustZone.
  • Shipped in the iPhone 1 / iPhone 3G and first-gen Nokia Nseries — before Cortex was a brand.
  • ARM1176JZF-S added TrustZone v1. Raspberry Pi 1 / Zero still uses this core today (2025).
  • ARM11 MPCore (2004) — Arm's first multi-core effort. Up to 4 cores with snoop-control unit. Proved SMP was feasible in the Arm ISA.
  • Limitations: only Armv6 (no NEON SIMD), modest IPC, no branch-predictor-worthy OoO. Needed a clean-sheet successor.

The 2004 Cortex profile split

One ISA, three mutually incompatible profiles with different MMUs, exception models, and licence bundles:

  • Cortex-A — Application: full MMU, Linux-class.
  • Cortex-R — Real-time: MPU, deterministic.
  • Cortex-M — Microcontroller: Thumb-only, NVIC.

Cortex-A8 was the first A

Announced 2005, shipping silicon 2007-2009. The profile split made sense only once the first A-core proved it.

04

Cortex-A8 — First of the Line (2005 / 2009 ship)

  • First Armv7-A core — introduced the full application-profile ISA that ran Linux, iOS, and Android until the 2014 AArch64 switch.
  • 13-stage dual-issue in-order pipeline. No OoO — Arm deliberately kept it simple to hit 600 MHz – 1 GHz in 65 nm mobile process.
  • First Arm with NEON (Advanced SIMD) — 128-bit integer / single-precision SIMD. Completely reshaped the mobile media pipeline.
  • Separate L1 I/D and an optional unified L2 up to 1 MB — caches became standard from here on.
  • Shipping platforms: TI OMAP3 (Nokia N900, BeagleBoard), Samsung Exynos 3110 (iPhone 3GS, Galaxy S1), Freescale i.MX51, Apple A4.
Cortex-A8 (2005 IP, 2009 silicon) Architecture ...... Armv7-A Pipeline .......... 13-stage dual-issue Execution ......... in-order SIMD .............. NEON (128-bit) FPU ............... VFPv3 Clock target ...... 600 MHz – 1 GHz L2 ................ optional, up to 1 MB MMU ............... VMSAv7 (32-bit)

The "smartphone CPU" that made modern smartphones possible.

05

Cortex-A9 — First Multi-Core & First OoO (2007 / 2011)

  • First Arm with out-of-order execution — dual-issue, 8-stage, speculative, with a proper ROB and rename.
  • First multi-core Cortex — the Cortex-A9 MPCore (up to 4 cores) with an integrated SCU (Snoop Control Unit) and shared L2.
  • Flagship platforms: Nvidia Tegra 2 / Tegra 3 (first quad-core Android), TI OMAP4, Samsung Exynos 4210 (Galaxy S2, Galaxy SIII international), Apple A5 (iPhone 4S, iPad 2).
  • Brought TrustZone v2 refinements + PL310 L2 controller into the mainstream.
  • The A9 era is when "Arm in anything" became "Arm in everything" — 2011 ships hit >1 B shipped cortex-A units.

Why A9 chose OoO

Mobile workloads (browser JS, JPEG decode, video post-processing) are cache-bound and branchy. In-order A8 stalled on the same cache misses OoO could absorb. Going to 2-wide OoO at 1 GHz delivered ~30% higher IPC than A8 at the same clock.

Cortex-A5 sibling

A tiny in-order Armv7-A core released alongside A9 for low-cost phones & set-top boxes. First hint of the "big/little" idea.

06

Cortex-A15 + A7 — big.LITTLE is Born (2011)

  • Cortex-A15 — 3-issue OoO, 15-stage, LPAE (40-bit physical addresses), hardware virtualization (first Arm with EL2-equivalent Hyp mode). Designed by Arm Austin.
  • Much faster than A9 but ~3× the power. Pairing it with a Cortex-A7 (in-order, A5-like but with virt support) gave Arm their big.LITTLE story.
  • CCI-400 cache-coherent interconnect (AMBA 4 ACE) glues the two clusters. OS migrates threads between clusters at runtime (IKS, then GTS).
  • First devices: Exynos 5 Octa (Galaxy S4, Note 3), Mediatek Helio X10, Renesas R-Car H2.
  • Remembered as a rough first generation — thread-migration latency, OS support, thermal throttling all needed a second pass.
big.LITTLE (2011-2017) Cortex-A15 big · 3-wide OoO Cortex-A7 LITTLE · in-order CCI-400 (ACE coherent) DRAM / L3 / GPU / Display
07

The AArch64 Pivot — Cortex-A53 / A57 (2012 IP, 2014 silicon)

  • Armv8-A announced October 2011 — first Arm with a 64-bit ISA (A64), 31 × 64-bit GPRs, fixed 32-bit encoding, redesigned exception model.
  • Cortex-A57 — big core, 3-wide OoO, ~40% faster than A15. Cortex-A53 — little core, in-order, replaces A7 in a new big.LITTLE pair.
  • Also launched: Cortex-A35 (2015) as the A53's successor for wearables.
  • First shipping A53 silicon: Apple A7 (Sept 2013) was actually Apple's own Armv8-A design (Cyclone) — beat Arm to market by a year.
  • Arm-core Armv8 mass shipment: Samsung Exynos 5433 (Note 4), Qualcomm Snapdragon 810 (A57+A53), MediaTek Helio X10.
  • Armv8-A also brought NEON with full double-precision FP, strong release/acquire memory ordering primitives, and the AES/SHA crypto extensions.

Why 64-bit mattered for phones

Not the address space — it was the register file. 31 × 64-bit registers crushed Armv7's 16 × 32-bit, cutting spills in compiled code by ~30%. Also fixed the memory-ordering model to be rigorously specified.

The Apple A7 shock

Apple silicon engineering was years ahead — they shipped the first Armv8-A phone SoC while Arm's own big cores were still a year from silicon. Arm has been chasing them ever since.

08

Annual Flagships — A72, A73, A75, A76, A77, A78

CoreYearDesign sitePipelineKey addition
Cortex-A722015Austin3-wide OoO, 15-stageMajor IPC jump over A57; first "premium" Arm core
Cortex-A732016Sophia (ex-TI)2-wide OoO, 11-stageNarrower / shorter — area + power wins for mid-range
Cortex-A752017Austin3-wide OoO, DynamIQ-nativeFirst DynamIQ-capable core; Armv8.2-A
Cortex-A762018Austin4-wide OoO, 13-stageArmv8.2-A + bfloat16 (A76AE)
Cortex-A772019Austin4-wide OoO, 13-stage~20% IPC over A76; wider FP/NEON
Cortex-A782020Austin4-wide OoO, 13-stageEfficiency tune of A77 on 5 nm; Armv8.2-A + MTE

A72 → A78 were the "big" half of big.LITTLE / DynamIQ. Cadence: one flagship per year, aligned to the annual smartphone SoC tape-out window. Austin has owned this roadmap since A57.

09

Cortex-X "Custom Cores" (2020-)

  • From 2020 Arm split the big core into two yearly deliveries: a Cortex-A flagship (balanced) and a Cortex-X (all-out perf).
  • The X program (codename Hera) lets key customers pay for a bigger-than-standard die budget — wider fetch, more ROB, bigger L2, faster branch predictor.
  • X1 (2020) — 5-wide decode, 8-wide issue, 1 MB L2. Launched in Exynos 2100 & Snapdragon 888.
  • X2 (2021) — first Armv9-A. AArch64-only (dropped AArch32 at EL0).
  • X3 (2022) — 6-wide decode. Mediatek Dimensity 9200, Snapdragon 8 Gen 2.
  • X4 (2023) — 10-wide dispatch, 384-entry ROB. Dimensity 9300 (3 × X4 + 4 × A720, no little core).
  • X925 (2024, codename Blackhawk) — 10-wide decode, ~15% IPC over X4. New naming ("9" = premium, "25" = 2025 platform).

Why a separate X family?

The Austin A-flagship cadence was constrained by mid-range power/area budgets. Customers like Samsung, Qualcomm, and MediaTek wanted "go-big" cores for benchmarks. X lets Arm ship those without bending the whole A-line.

Cortex-X is still Arm IP

Not to be confused with Nuvia / Oryon / Apple-style "custom architecture licensees" who build their own microarchitecture under the Armv9-A ISA. Cortex-X comes as an Arm-delivered RTL drop.

10

Armv9-A — The 2021 Generation

  • Announced March 2021. Same ISA lineage as Armv8-A (it's Armv8.5-A-plus in practice) but mandates features that were optional before:
  • SVE2 (Scalable Vector Extension v2) — mandatory. Finally brings vectors to mobile.
  • MTE (Memory Tagging Extension) mandated on profiles that include it.
  • RME / CCA (Realm Management Extension, Confidential Compute Architecture) — new "Realm" world alongside Secure / Non-secure / Root.
  • MPAM v1, BRBE, TRBE for RAS + profiling.
  • First Armv9 cores: Cortex-X2, A710, A510 (the "Matterhorn" platform).
  • X2 / A710 / A510 shipped in 2022's Snapdragon 8 Gen 1 and Dimensity 9000.

Armv9-A != new ISA

Marketing made it sound like a clean slice. In reality it's mostly: "take Armv8.5-A, add SVE2 & CCA, mandate MTE". Existing Armv8-A software continues to run. The name is a branding reset — useful for customers too, after a decade of "Armv8.0 through v8.7".

AArch32 starts dying

X2, A710 dropped AArch32 at EL1-EL3; A510 became AArch64-only at all ELs. 32-bit Android apps had to migrate — the cut-off that ended "32-bit phones".

11

Today's Flagships — A520 / A720 / A725 / X925 (2024)

CoreRoleYearArmvNotes
Cortex-A520LITTLE2023v9.2Dual-core complex (share L2/FPU) for area. AArch64-only.
Cortex-A720middle2023v9.2Successor to A715. IPC ~10% over A715, power reduced.
Cortex-A725middle+2024v9.2Incremental A720 refresh — "Chaberton" platform.
Cortex-X925big2024v9.2"Blackhawk". 10-wide decode, huge BTB, new TAGE-SC-L-style predictor.
Cortex-A325ultra-LITTLE2024v9.2First Armv9-A for MCUs/embedded Linux class — replaces A35.

New naming scheme: 3-digit suffix = platform year ("925" = 2025 platform). The Cortex-X925 + A725 + A520 triple is what ships in Dimensity 9400, Snapdragon 8 Gen 4 / 8 Elite, Exynos 2500.

12

big.LITTLE → DynamIQ (2017)

  • big.LITTLE limitation: two fixed-size clusters, each with its own L2. OS must migrate threads between clusters to go fast ↔ low-power. Coherency via an external CCI.
  • DynamIQ (2017, introduced with Cortex-A75/A55): single cluster, up to 8 heterogeneous cores, a shared L3, per-core asynchronous voltage/frequency.
  • The coherence fabric — the DynamIQ Shared Unit (DSU) — absorbs what CCI used to do plus gives a shared L3.
  • Enables the iconic 1 + 3 + 4 layout (1 × X, 3 × A-mid, 4 × A-little) inside a single DSU cluster — impossible under big.LITTLE.
  • DSU-110 (2021) scales to 8 cores + 16 MB L3. DSU-120 (2023) adds slice counts up to 14 cores.
DynamIQ 1+3+4 cluster X925 3 × Cortex-A725 (mid) 4 × A520 (little) DSU-120 (shared L3) SCU · L3 slices · Snoop filter · AMBA 5 CHI egress CMN / NIC-600 · DRAM · GPU · NPU
13

Interactive — Pick a Cortex-A Core

A8
A9
A15
A53
A57
A72
A76
A78
X1
A520
A725
X925
14

Timeline — 2002 to 2024

2002
ARM1136JF-S — first Armv6 with full MMU + TrustZone
2005
Cortex-A8 announced — first Armv7-A, first NEON
2007
Cortex-A9 MPCore — first Arm OoO, first multi-core Cortex
2011
big.LITTLE — A15 + A7 paired via CCI-400
2012
Armv8-A announced (AArch64); Cortex-A53 / A57 IP released
2014
First Arm-IP Armv8 silicon ships (Exynos 5433, Snapdragon 810) — 12 months after Apple A7
2017
DynamIQ launched with Cortex-A75 / A55 — replaces big.LITTLE
2018
Cortex-A76 — first "laptop-class" Arm core; becomes Neoverse N1
2020
Cortex-X1 — first Custom-Core deliverable
2021
Armv9-A announced; Cortex-X2 / A710 / A510 — SVE2 + CCA mandatory
2023
Cortex-A520 drops AArch32 at all ELs — 32-bit Android apps must migrate
2024
Cortex-X925 "Blackhawk" — 10-wide decode, new 3-digit naming
15

Fujitsu A64FX & Apple Silicon — the Outliers

Fujitsu A64FX (2019)

Custom Armv8.2-A + first-ever SVE implementation (512-bit fixed width). Powers the Fugaku supercomputer — #1 on Top500 in 2020-21. Not a Cortex-A, but it proved SVE at scale and convinced Arm to make SVE2 mandatory in v9-A.

Apple Silicon (2013-)

Cyclone (A7) → Typhoon (A8) → ... → Thunder / Everest (A17 Pro) → A18 Pro (2024). 8+ wide decode from day one; Apple took out an Architectural Licence, not a Cortex IP licence. All Apple cores implement the Armv8-A / v9-A ISA but not a single line of Arm Cortex RTL.

  • The ISA-licence model is how Apple, Qualcomm Nuvia/Oryon, Ampere Altra/One, and NVIDIA Grace all exist — Cortex-X is for customers who don't want to build a CPU from scratch.
  • A64FX showed the server world that Arm could do HPC; Apple showed the phone world that Arm could do laptop-class perf. Both motivated the modern Cortex-X + Neoverse investment.
16

Lessons

  • "Why split Cortex-A and Cortex-X?" → annual flagship cadence, but A is constrained by mid-range power/area; X is Arm's "go big" sibling at premium SoC die budget.
  • "Why did AArch64 matter for phones?" → not address space — 31 × 64-bit registers cut compiler spills ~30%, plus a properly specified memory model.
  • "What killed big.LITTLE?" → rigid two-cluster layout. DynamIQ's single-cluster + shared L3 + 1+3+4 layout is strictly better for modern phones.
  • "Why is Apple ahead?" → architectural licence + 8+ wide decode from 2013. Arm only matched that in X4/X925.
  • "Is AArch32 dead?" → in flagship cores, yes (v9-A strips it). In low-cost A53/A35 still alive. Android 32-bit app support ends with Armv9-A flagships.
  • "Why does Arm ship middle cores?" → the A720/A725 "middle" tier is for day-to-day sustained loads that would thermally throttle an X-core but are too heavy for A520. Modern phones average on middle cores for ~60% of workload time.
  • "Neoverse vs Cortex-A?" → Neoverse N1 was literally a re-characterised A76. Neoverse is Cortex-A ported into a CMN mesh fabric with server-grade features (MPAM, RAS, SBSA).
17

Cortex-A Roadmap — Where It's Going

  • AArch64-only everywhere: by ~2026 every shipping Cortex-A from Arm will be AArch64-only. Armv8-A AArch32 entry is a legacy feature.
  • SME (Scalable Matrix Extension): Armv9.2-A+. Streaming-mode matrix/tile instructions — converges with Apple AMX. Expected in Cortex-X next-gen.
  • Confidential Compute (CCA): currently "server-oriented" but will land in mobile for protected-media and on-device AI by 2026.
  • Android laptops & Windows-on-Arm: Cortex-X925 class IP targets 15-30W power envelopes — the old laptop MacBook Air zone.

The automotive A-class push

Arm's AE variants (A76AE, A78AE) add ASIL-B/D safety (split-lock cores, parity, ECC). The next Cortex-AE tier is the revenue-growth story alongside Neoverse in Arm's post-2023 IPO strategy.

Arm's 2025 challenge

Apple Silicon keeps widening the decode. Arm's X925 just caught up to Apple M1-era. The race for sustained >10 IPC at <5 W is what the next five years of Cortex-X is about.

18

References

Arm Ltd.Arm Architecture Reference Manual (A-profile) — DDI 0487, freely downloadable
Arm Ltd. — Cortex-A core Technical Reference Manuals (one per core, publicly available on developer.arm.com)
Arm Community Blog — annual "Introducing Cortex-X" / "Introducing Cortex-A" announcement posts (2020-2024)
WikiChip — wikichip.org/wiki/arm_holdings/microarchitectures — maintained microarchitecture summaries with die photos
AnandTech — Andrei Frumusanu's deep-dive reviews of every Cortex-A / X generation from A72 onwards (2015-2023 archive)
Joseph YiuSystem-on-Chip Design with Arm Cortex-M Processors — sibling volume; A-profile coverage in the Arm architecture chapters
David SealARM Architecture Reference Manual, 2nd ed. (Addison-Wesley) — classic reference, covers through Armv5
Arm IPO prospectus (2023) — shipment splits, licensee counts, A-profile royalty share
Hot Chips / ISSCC — annual microarchitecture talks (A76, A77, X1, X4, Neoverse N1/V1)

Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.