ARM CORTEX-M · PRESENTATION 01

History of Arm & the Architecture

From the BBC Micro to Armv9 · Four decades of one RISC design
Acorn · ARM1 · Apple Newton · ARM7TDMI · Cortex-A/R/M · Helium · TrustZone · Armv9
Navigate: → ←  |  Overview: Esc  |  Fullscreen: F
02

Why the History Matters

  • Many Cortex-M design choices only make sense once you understand what Arm was in 1985, 1994, and 2004.
  • The Thumb-only M-profile, the fixed memory map, the NVIC priority model — all are reactions to specific historical pain points.
  • The IP licensing model Arm pioneered is why the same core design ships in 220+ different MCU SKUs today.
  • Interview panels love "why is it like this?" questions — a quick history of the architecture is often the cleanest answer.
"We didn't actually have enough money to design a chip in the usual sense, so we had to do it in a different way." — Steve Furber, co-designer of the ARM1 (Cambridge University, 2012 interview)

The original Arm was a frugality-driven design. That DNA — simplicity, orthogonality, predictable timing — is still visible in every Cortex-M today.

03

Prehistory — Acorn Computers

  • Acorn Computers founded in Cambridge (UK), 1978 by Chris Curry, Hermann Hauser, and Andy Hopper.
  • Built the Acorn System 1 (6502-based) and then won the BBC Micro contract in 1981 — 1.5 M units sold, one per UK classroom.
  • Acorn's next-gen PC needed a faster CPU. They evaluated the Intel 80286 (rejected — poor interrupt handling), Motorola 68000 (too slow, too complex), and National Semi NS32016 (too expensive).
  • After reading papers on RISC from Berkeley (Patterson, RISC-I 1981) and Stanford (MIPS, 1983), Sophie Wilson and Steve Furber proposed designing Acorn's own RISC processor.

The RISC insight

Berkeley's RISC-I (44k transistors) outperformed the commercial 68000 (68k transistors) on compiled C code. A small team with no fabrication budget could compete with Motorola and Intel by making the design radically simpler.

Constraints bred virtues

Acorn could not afford a full EDA tool flow or a dedicated fab. The whole architecture had to be simulated on a BBC Micro in BBC BASIC before any silicon could be taped out — forcing simplicity.

04

ARM1 — First Silicon, April 1985

  • Team of ~6 people, led by Sophie Wilson (ISA) and Steve Furber (microarchitecture).
  • VLSI Technology fabbed it on a 3 µm process. 25,000 transistors — smaller than a 6502.
  • 32-bit architecture, 32-bit registers, 26-bit program counter packed into R15 with status flags.
  • 3-stage pipeline: Fetch — Decode — Execute. Every instruction conditional. Barrel shifter in the ALU path "for free".
  • Famous story: the test chip drew so little power (a few hundred mW) that when the designers probed the Vdd pin they found it disconnected — the chip was running on leakage from the I/O pads.
ARM1 (1985) Process ........... VLSI 3 µm CMOS Transistors ....... ~25,000 Die area .......... ~50 mm² Clock ............. 6 MHz Registers ......... 16 × 32-bit Pipeline .......... 3-stage Instruction count . ~44 Power ............. < 0.1 W

ARM = Acorn RISC Machine — renamed to Advanced RISC Machines in 1990.

05

ARM2 / ARM3 — Commercial Products

ARM2 (1986)

  • First Arm to ship in a commercial product: the Acorn Archimedes A305 (June 1987) — the world's first RISC-based desktop computer.
  • 8 MHz, ~30,000 transistors, 4 MIPS — out-ran a 16 MHz 80386 on integer code.
  • No cache, no MMU, no FPU. Ran the Arthur/RISC OS at desktop-responsive speed.

ARM3 (1989)

  • Added a 4 KB unified cache on-die — first Arm to have any cache.
  • 25 MHz, ~310,000 transistors.
  • Last Acorn-designed Arm before the Apple spin-out.

Archimedes in context

In 1987 the competition was the Amiga 500 (68000 @ 7 MHz), the Atari ST (68000), and early 80286 PCs. An 8 MHz Archimedes was benchmarked at 3–5× the CPU performance of all of them on FP-free workloads.

Despite the technical win, Acorn's total volume was a rounding error next to IBM-PC clones. This set up the pivotal move that turned Arm from "a cool British chip" into the world's most widely-licensed CPU: Apple.

06

1990 — The Apple / VLSI Spin-Out

  • Apple was developing the Newton MessagePad — needed a low-power 32-bit CPU.
  • The Newton team (Larry Tesler) evaluated Arm and loved it, but Acorn was a flaky British PC maker with no long-term prospects.
  • November 1990: Apple, Acorn, and VLSI Technology jointly founded Advanced RISC Machines Ltd — 12 engineers, one conference room in Cambridge.
  • The new Arm was a pure IP licensing company: no fab, no products, just designs sold to whoever wanted to build silicon.
  • Apple invested £1.5M for 43% of the new company; VLSI 43%; Acorn 15%.
"I had a piece of A4 paper with five bullets on it. 'Don't just be a chip company; license the design, let others make the chips.' That was the plan." — Robin Saxby, first CEO of ARM, recounting 1991 (Arm 30-year anniversary interview, 2020)

The licensing revolution

Before Arm, CPUs came from Intel, Motorola, IBM, or DEC — each vertically integrated. Arm's model let any semiconductor company ship a competitive 32-bit CPU by licensing a well-supported IP block. Fifteen years later, nearly every mobile-phone chip vendor was an Arm licensee.

07

ARM6 → ARM7TDMI — The Mobile Break

ARM6 (1992) → ARM7 (1994)

  • ARM6 was used in the Newton MessagePad.
  • ARM7 added a 3-stage pipeline (deeper fetch), higher clock.
  • Crucially, Arm introduced Thumb — a 16-bit compressed instruction set — in 1994.

ARM7TDMI (1994)

  • Thumb, Debug, Multiplier, ICE-RT.
  • Nokia picked it for the 6110 (1998); Texas Instruments put it at the heart of the OMAP baseband.
  • By 2000, ARM7TDMI was in hundreds of millions of mobile phones — the single most-shipped 32-bit core of the 20th century.

Why Thumb mattered

16-bit Thumb encoded the most common instructions in half the space. On a phone with 256 KiB of expensive masked ROM, Thumb reduced code size by ~30% — two-for-one on flash cost.

The core switched between ARM mode (32-bit) and Thumb mode (16-bit) at runtime via BX.

The Cortex-M series would later take Thumb so far that it became the only mode — A32 simply doesn't exist in M-profile.
08

Architecture Versions — Interactive

Click a version to see what it added and what shipped under it.

Armv1
Armv2
Armv3
Armv4
Armv5
Armv6
Armv7
Armv8
Armv9
Armv7 (2004–2014)
Profiles introduced: Armv7-A · Armv7-R · Armv7-M · Armv7E-M
  • First Arm architecture to be split into profiles — Application, Real-time, Microcontroller.
  • ARMv7-M introduced the Cortex-M3 (2004–2006) — Thumb-only, NVIC, MPU.
  • Thumb-2 extended Thumb with 32-bit encodings; full parity with A32.
  • NEON SIMD added on the A-profile side (Cortex-A8, 2007).
  • LPAE, TrustZone (A-profile), Virtualisation extensions added during v7 lifetime.
09

Profile Split — The 2004 Decision

  • By the early 2000s, a single Arm ISA (ARM7/ARM9/ARM11) had to serve phones, servers, MCUs, and automotive.
  • Each market pulled in a different direction:
    • Phones/servers — caches, MMU, virtualisation.
    • Automotive/industrial — deterministic timing, lockstep CPUs.
    • MCUs — small die, fast IRQ, no MMU, low power.
  • Arm's answer: split Armv7 into three profiles.
Profilev7 flagshipNiche
A ApplicationCortex-A8 (2007)Phones, set-top, later servers
R Real-timeCortex-R4 (2006)Baseband, automotive, storage
M MicrocontrollerCortex-M3 (2004)Deeply-embedded MCUs
Cortex-M3 announced first (October 2004) — Arm's deliberate signal that the microcontroller market was being taken seriously. It went on to single-handedly displace the 8051 and 16-bit (HC08, MSP430) families from 32-bit MCUs.
10

Birth of Cortex-M

2004
Cortex-M3 announced — first Armv7-M core. Thumb-only, NVIC, MPU optional. Luminary Micro (later acquired by TI) was lead licensee with the LM3S series.
2006
STMicro licenses Cortex-M3 for the STM32F1 family — launched 2007. First mass-market 32-bit MCU with a rich ecosystem.
2007
NXP (formerly Philips Semi) licenses Cortex-M3 for the LPC1700. The "LPC vs STM32" rivalry begins.
2009
Cortex-M0 and M1 released — Armv6-M Baseline. At ~12k gates, the M0 became the smallest 32-bit CPU Arm ever shipped. Instantly killed new 16-bit MCU starts.
2010
Cortex-M4 released — Armv7E-M. Adds DSP SIMD + optional single-precision FPU. The default DSP-adjacent MCU for a decade.
2012
Cortex-M0+ — 2-stage pipeline revision of M0. Single-cycle I/O port, micro-trace buffer. Massive in wireless & IoT.
2014
Cortex-M7 released — Armv7E-M with 6-stage dual-issue pipeline, caches, TCM, FPv5. STM32F7 / STM32H7 hit 600 MHz.
2016
Cortex-M23 and M33 — first Armv8-M cores. TrustZone for MCUs arrives. PSA / TF-M launches.
2019
Cortex-M35P — physical-security-hardened variant. Targets smart-card / EMV.
2020
Cortex-M55 — Armv8.1-M, first core with Helium (MVE). TinyML on MCU becomes practical.
2022
Cortex-M85 — flagship. 7-stage dual-issue, PACBTI, Helium. ~6.3 CoreMark/MHz.
2024
Cortex-M52 — mid-range Helium-capable core, targeting the M33 replacement slot with built-in MVE.
11

Why Thumb-Only in M-Profile?

Thumb-1 (1994)

  • 16-bit encoding of a subset of A32 instructions.
  • Code density ~30% better than A32; performance ~70%.
  • Phones used Thumb to save ROM mask costs.

Thumb-2 (2003)

  • Added 32-bit encodings for instructions that wouldn't fit in 16 bits.
  • A single compiler output could reach all A32 functionality while still averaging ~16-bit code size.
  • Thumb-2 could replace A32 entirely.

M-profile decision (2004)

  • Strip out A32 state altogether. No mode switch. No CPSR.T bit toggling.
  • All exception handlers and interrupt vectors are Thumb.
  • Benefits:
    • Simpler pipeline (one decoder).
    • Smaller die area.
    • Uniformly dense code — crucial when flash is 32–64 KB.
    • Deterministic code size for ROM vendors.
Consequence for today: when a Cortex-M Reset_Handler loads the reset vector into PC, the low bit must be 1 to mark Thumb state — even though Thumb is the only state. The architectural LSB of any function pointer is always 1.
12

Shipments — the Cortex-M Curve

2005 2010 2013 2016 2019 2022 2024 0 10 B 20 B 30 B 40 B Cumulative Arm-based chips shipped (approx.) 10B cumul. (2013) ~18B (2017, M-profile ≈ 2/3) ~28B (2020) ~40B (2022) > 50B (2024)

Figures from Arm's investor presentations and the 2023 IPO prospectus. Cortex-M represents the largest shipment slice (by unit count); Cortex-A represents the largest revenue slice.

13

Inflection Points in M-Profile

Thumb-2 everywhere (2004)

Dropping A32 in the microcontroller profile was a one-way choice — once made, every toolchain and every RTOS simplified. Nobody regretted it.

NVIC is the core (2004)

Integrating the interrupt controller into the CPU with a fixed register map meant driver code could be written against "the NVIC" instead of "STMicro's interrupt controller". This portability cemented Cortex-M's dominance.

DSP extension (2010)

Cortex-M4's SIMD and saturating math unlocked audio & motor-control for MCUs. Before M4, these tasks needed a dedicated DSP (TI C28x, ADSP Blackfin).

TrustZone-M (2016)

Bringing secure/non-secure state to MCUs made PSA-certified IoT / payment / medical possible without a separate secure element chip.

Helium MVE (2019)

128-bit beat-wise vector with int8 + FP16 lanes — explicitly designed for TinyML inference. 5× speedup on CNN kernels vs DSP intrinsics on M4.

PACBTI (2022)

Pointer Authentication + Branch Target Identification hardens control flow against ROP/JOP exploits — a first-class defence on MCUs, not just servers.

14

The Licensing Model

  • Arm does not fab chips. It licenses synthesisable RTL (the "core IP") plus tool flows, compilers, and reference designs.
  • Typical licence structure:
    • Per-design licence — paid once per SoC tape-out.
    • Royalty — a few pence per chip, for the life of the product.
  • Arm Flexible Access (2019) — a subscription that bundles most Cortex-M cores + tools for start-ups & small teams. Lowered the barrier to a first Cortex-M SoC to ~€75k / year.
  • Cortex-M0 DesignStart (2017) — free-to-license M0 + interrupt controller for academic / evaluation use.
  • 220+ companies license Cortex-M today.

Why it works

A single team at Arm (~150 people for the M-profile CPU group) designs the core. 220+ silicon companies then independently ship tens of thousands of SKUs, reusing Arm's verification, Arm's compiler, Arm's CMSIS headers. Software ecosystem compounds across vendors.

Contrast with x86

Intel & AMD both design and fabricate. Result: 2 vendors, tight vertical integration, no 8051-like diversity. Arm's model gave it the long tail of MCU vendors x86 never had.

15

Corporate History — SoftBank, Nvidia, IPO

1990
Arm Holdings founded (Apple / Acorn / VLSI joint venture), Cambridge.
1998
Arm IPO on LSE and NASDAQ. ~£266M market cap.
2016
SoftBank acquires Arm for £24.3 B. Taken private.
2020
Nvidia announces $40 B acquisition of Arm from SoftBank.
2022
Nvidia / SoftBank deal abandoned under UK/EU/US regulatory pressure.
2023
Arm IPO on NASDAQ (Arm Holdings plc, ticker ARM), Sep 2023. ~$54 B valuation at open.
2023
Arm Total Design ecosystem launched — reference flows bundling Arm CPUs with partner chiplet IP for custom silicon.
2024
Arm pivots marketing toward AI — Neoverse for datacentre AI, Cortex-M52/M85 for TinyML at the edge.
16

What Cortex-M Inherits

From ARM1 (1985)

  • Load/store architecture — compute only between registers.
  • 16 general-purpose registers — R0–R15, same layout 40 years later.
  • R13=SP, R14=LR, R15=PC convention unchanged.
  • Pipelined design with in-order execution.

From ARM7TDMI (1994)

  • Thumb — compact 16-bit encoding.
  • Debug — architected debug interface (CoreSight evolved from this).
  • ICE-RT — halting debug concept.

From ARMv5E (1999)

  • DSP extensions — saturating arithmetic, double-MAC (now in Cortex-M4+).

From Armv7-M (2004)

  • NVIC in the core.
  • Thumb-2 full ISA.
  • Profile split — M-profile became its own thing.

From Armv8-M (2016)

  • TrustZone, stack-limit registers, SAU/IDAU.

From Armv8.1-M (2019)

  • Helium (MVE), low-overhead loops, PACBTI.
17

Competitive Landscape Today

CompetitorPosition vs Cortex-MStatus (2024/5)
RISC-VOpen ISA alternative; gaining in cost-sensitive / China-sovereign silicon.Raspberry Pi RP2350 ships with dual M33 or dual Hazard-3 RISC-V. SiFive, Andes, Codasip supply RISC-V MCU cores.
8051 / AVR / PIC8-bit; still shipping but new designs are rare.Ultra-low-cost & legacy markets only.
TI C28xDedicated DSP for motor control.Co-exists with Cortex-M4 in the same SoC; hybrid parts like C2000 / F28P6x pair M33 + C28x.
Microchip PIC32 MIPS32-bit MIPS MCU line.Maintained but not expanding; MIPS-the-company pivoted to RISC-V in 2022.
Infineon TriCore / AURIXAutomotive 32-bit with integrated safety MCU.Strong in power-train / ADAS; orthogonal to Cortex-M.
RISC-V is the real long-term competitor. The Cortex-M's answer is the CMSIS software moat, the production-hardened NVIC model, and the tool-chain / vendor ecosystem — territory the RISC-V world is still building in 2025.
18

Timeline — One Slide

1985 1990 1994 2004 2010 2016 2020 2024 Corporate A / R profiles M profile Acorn / ARM1 Apple JV (1990) LSE IPO (1998) SoftBank (2016) Arm IPO (2023) ARM7TDMI Cortex-A8 Armv8 A64 Armv9 / SVE2 Cortex-M3 M0 / M4 M7 / M33 M55 (MVE) M85 / M52 profile split v8 split
19

Lessons for the Interview

  • "Why Thumb-only?" → code density + pipeline simplicity; the Archimedes-to-Newton lineage cared about ROM cost first, silicon area second.
  • "Why is the NVIC architectural?" → because Arm's 2004 profile split made the interrupt controller part of the CPU contract; portability of driver code.
  • "Why is the priority field 8 bits but only top N used?" → forward-compatibility: a driver compiled for a 4-bit impl works on an 8-bit impl without changes.
  • "Why did v8-M bring TrustZone to MCUs?" → PSA & IoT security economics (2014–2016) required root-of-trust isolation without a second chip.
  • "Why did M7 bring caches but M3/M4 didn't?" → flash wait-states became dominant at 300+ MHz; caches hide them. At < 100 MHz, single-cycle flash makes caches unnecessary.
  • "What's the difference between Helium and NEON?" → Helium is 128-bit beat-wise SIMD designed for MCU area; NEON is a full 128-bit SIMD block on A-profile. Different instruction set, different micro-architecture.
  • "Is Cortex-M relevant now?" → M-profile accounts for the majority of Arm units shipped. Every Wi-Fi dongle, smartwatch companion, car door-lock sensor, washing-machine controller ships one.
20

References

Arm Ltd.Arm Architecture Reference Manual (A-profile, R-profile, M-profile) — all freely downloadable
Steve FurberARM System-on-Chip Architecture, 2nd ed. (Addison-Wesley, 2000) — history + architecture by one of the original ARM1 designers
Sophie Wilson & Steve Furber — oral histories, Computer History Museum (CHM) and the British Computer Society archive
Joseph YiuDefinitive Guide to ARM Cortex-M3/M4 & Cortex-M23/M33 — chapters on Arm architecture history
Chisnall, D. — "Understanding ARM Architectures" (ACM Queue, 2015)
Arm IPO prospectus (2023) — historical shipment data, revenue splits, licensee counts
Wikipedia — "ARM architecture family", "Acorn Computers", "ARM Holdings" — surprisingly well-sourced cross-references
Hauser, Curry, Saxby — interviews in Micro Men (BBC, 2009) and the 30-year Arm anniversary documentary (2020)

Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.