ARM CORTEX-M · PRESENTATION 02

Architecture & Programmer's Model

The Cortex-M Family · Armv6-M / v7-M / v8-M · Thumb-2 · Registers & Modes
M0 · M0+ · M3 · M4 · M7 · M23 · M33 · M35P · M52 · M55 · M85
Navigate: → ←  |  Overview: Esc  |  Fullscreen: F
02

Why Cortex-M Matters

  • The default 32-bit MCU ISA. Virtually every major MCU vendor (ST, NXP, Nordic, Silicon Labs, Renesas, Microchip, Infineon, TI, Ambiq, GigaDevice, Raspberry Pi) ships Cortex-M silicon.
  • Volume. Arm-based microcontrollers ship at tens of billions of units per year; Cortex-M is the largest slice.
  • Span. One family from a $0.10 Cortex-M0 to a 600 MHz cache-equipped Cortex-M85 — same toolchain, same CMSIS headers.
  • Ubiquity. Also lives inside bigger chips: as a secure enclave (Cortex-M33), a wake-up controller, or a peripheral offload engine in many A-profile SoCs.

Interview context

Cortex-M is the one architecture an embedded-firmware engineer is almost guaranteed to be tested on. Expect questions on the exception model, memory map, NVIC, MPU, and increasingly TrustZone.

"The M-profile is the only 32-bit architecture designed from the ground up for deterministic interrupt latency and bit-level peripheral control — everything else is a general-purpose CPU with interrupts bolted on."
— Paraphrasing Joseph Yiu's design philosophy
03

Arm Profiles: A, R, M

ProfileTargetMemoryOSExamples
A Application Smartphones, servers, laptops MMU with virtual memory (Sv39 etc.) Linux, Android, Windows, iOS Cortex-A78, Neoverse N2, Apple M-series
R Real-Time Automotive, baseband, storage MPU (protection, no virt) AUTOSAR, QNX, bare-metal Cortex-R52, R82
M Microcontroller Deeply embedded, IoT, sensors Optional MPU · physical addresses only FreeRTOS, Zephyr, bare-metal Cortex-M0+, M4, M33, M85

What makes M-profile different

  • Thumb-only — no A32/A64 mode; the core starts in Thumb state and stays there.
  • NVIC is architectural, not an SoC add-on. Every Cortex-M has one; every C compiler knows about it.
  • Exception entry in hardware: automatic stacking of the caller-saved register set.
  • No virtual memory. Addresses go directly to the bus.

What M-profile does not have

  • No A64 (AArch64) instructions — 32-bit only.
  • No virtualisation extensions (EL2).
  • No SMP coherency protocol. Multi-core Cortex-M is asymmetric.
  • Caches are optional and only appear on M7/M55/M85.
04

The Cortex-M Family Timeline

2005 2009 2010 2014 2016 2019 2022 2024 v6-M M0 M0+ M1 v7-M M3 M4 M7 v8-M M23 M33 M35P M55 M85 +Helium (v8.1-M)

Dates indicate initial announcement. v8.1-M (Helium/MVE) layered on top of v8-M Mainline adds the vector extension used by M52 · M55 · M85.

05

Choose a Core — Interactive

Click a core to see its headline features. Same silicon IP block, vastly different trade-offs.

Cortex-M0
Cortex-M0+
Cortex-M3
Cortex-M4
Cortex-M7
Cortex-M23
Cortex-M33
Cortex-M35P
Cortex-M52
Cortex-M55
Cortex-M85
Cortex-M85
Armv8.1-M Mainline · Helium (MVE) · 7-stage superscalar · PACBTI · Cache · TrustZone
  • Arm's current flagship M-profile — ~6.3 CoreMark/MHz, ~4 DMIPS/MHz.
  • Dual-issue in-order pipeline with branch prediction.
  • Full Helium (MVE) vector unit — int8/int16/int32/FP16/FP32.
  • Optional PACBTI (pointer auth + branch-target identification) for CFI.
  • Target: ML-on-MCU, high-end real-time control, audio DSP.
06

Cortex-M Lineup at a Glance

CoreArchPipelineDMIPS/MHzDSPFPUHeliumTrustZoneCache
M0 v6-M 3-stage 0.9
M0+ v6-M 2-stage 0.95
M3 v7-M 3-stage 1.25
M4 v7E-M 3-stage 1.25 FPv4-SP (opt)
M7 v7E-M 6-stage dual-issue2.14 FPv5 SP/DP (opt)L1 I+D
M23 v8-M Base2-stage 0.99
M33 v8-M Main3-stage 1.5 optFPv5-SP (opt)
M35Pv8-M Main3-stage 1.5 optFPv5-SP (opt)
M52 v8.1-M 4-stage 1.6 FPv5 SP (opt)✓ (int+FP)opt
M55 v8.1-M 4-stage 1.7 FPv5 SP (opt)✓ (int+FP)opt I+D
M85 v8.1-M 7-stage dual-issue4.0 FPv5 SP/DP (opt)✓ (int+FP)opt I+D
Headline numbers are upper-bound estimates from Arm's own benchmarking. Real silicon depends on memory latency, wait states, and which options the SoC vendor licensed.
07

Armv6-M — The Minimal M-Profile

Instruction set

  • Thumb subset: ~56 instructions.
  • No integer divide — software emulation.
  • No bit-field ops, no saturating arithmetic.
  • 32×32→32 multiply (1-cycle or 32-cycle by option).
  • IT block present but limited (only 32-bit BL, SVC, MRS/MSR, DSB/DMB/ISB are 32-bit encodings).

System features

  • No bit-banding.
  • No BASEPRI — only PRIMASK for IRQ masking.
  • Max 32 external IRQs, 4 priority levels.
  • MPU optional on M0+ (8 regions) — not on M0/M1.
  • Single vector table; no VTOR on M0 (relocation via remap-to-SRAM trick on some SoCs); VTOR on M0+.
Why it still matters: v6-M is cheap enough to drop in as a wake-up/boot controller on top of something far bigger. It is also the ISA targeted by dozens of Arm-compatible open clones.
08

Armv7-M / v7E-M — Full Thumb-2

v7-M (M3)

  • Full Thumb-2 (~200 instructions).
  • Hardware divide (SDIV/UDIV, 2–12 cycles).
  • Bit-field: BFI, BFC, UBFX, SBFX.
  • Exclusive access: LDREX/STREX (word, half, byte).
  • Bit-banding, NVIC, full MPU option.
  • Up to 240 IRQs, 8-bit priority field (implementations expose 3–8 bits).

v7E-M adds (M4, M7)

  • DSP extension: SSAT/USAT, QADD/QSUB, QADD8/QADD16, SMLAD, SMLAL, SMMUL etc.
  • Packed-operand SIMD on 32-bit registers (2×16-bit or 4×8-bit lanes).
  • Optional single-precision FPU (FPv4-SP on M4, FPv5-SP or FPv5-DP on M7).
The DSP extension alone is a major interview topic for any DSP-adjacent role — see presentation 04.
09

Armv8-M — TrustZone Arrives

v8-M Baseline (M23)

  • Armv6-M + TrustZone (the "hardware security" part).
  • Adds SG, BXNS, BLXNS, MOVW/MOVT, hardware divide.
  • Keeps the small area footprint of M0+/M23.

v8-M Mainline (M33, M35P)

  • Superset of v7-M + TrustZone.
  • Introduces stack-limit registers (MSPLIM, PSPLIM) — HW-enforced stack overflow detection.
  • Co-processor interface (ACI) for vendor-specific instructions.
  • Separate S/NS banks for SP, PSPLIM, MSPLIM, CONTROL, FAULTMASK, BASEPRI, PRIMASK.
Armv8.1-M (M52/M55/M85) — layered on top of v8-M Mainline. Adds Helium (MVE), low-overhead loops (LO Branch extension), custom-instruction framework, optional PACBTI. See presentation 04.
10

Thumb-2 — Mixed 16/32-bit Encoding

The idea

  • Cortex-M runs only in Thumb state — no A32 (ARM state), no A64.
  • Most common instructions encode in 16 bits → excellent code density.
  • Less-common or wider-immediate variants use a 32-bit encoding, distinguishable by the top-5 bits of the half-word.
  • Result: ≈ ARM 32-bit performance at ≈ ARM code size divided by 1.3.
Code density matters more than clock speed when your flash is 64 KB and costs $0.02/KB.

Encoding at a glance

; 16-bit encodings (common case)
  MOV   r0, #42        ; 0x202A
  ADD   r0, r1         ; 0x1840
  LDR   r0, [r1, #4]   ; 0x6848

; 32-bit encodings (wider imm, etc.)
  MOVW  r0, #0x1234    ; F240 0234
  BL    some_func      ; F7FF FFFE
  UDIV  r0, r1, r2     ; FBB1 F0F2
  SMLAL r0,r1,r2,r3    ; FBC2 0103

Decoder inspects the first half-word: bits[15:11] = 11101, 11110, or 11111 → 32-bit instruction.

11

The IT (If-Then) Block

Arm A32 condition codes on every instruction — Thumb-2 instead uses a lightweight IT instruction to predicate the next 1–4 instructions.

; if (r0 < r1) r2 = r0; else r2 = r1;
  CMP   r0, r1
  ITE   LT          ; If Then Else
  MOVLT r2, r0      ; IF branch
  MOVGE r2, r1      ; ELSE branch
  • Encodes "T" and "E" for up to 4 following instructions.
  • Avoids short branches; predictable timing → good for ISRs.

Interview gotchas

  • An IT block is architectural — the CPU tracks ITSTATE in EPSR. You cannot branch into the middle of one.
  • An exception taken inside an IT block is fine — hardware saves xPSR (which includes ITSTATE) on stack and restores it on return.
  • v8-M Mainline deprecates IT blocks of more than one instruction; v8.1-M adds branch-future alternatives for the same purpose.
12

Core Registers — R0 to R15

R0 – R3
Argument / scratch. Caller-saved. Auto-stacked on exception.
R4 – R11
Variable registers. Callee-saved.
R12 (IP)
Intra-procedure scratch. Used by linker veneers. Auto-stacked.
R13 (SP)
Stack pointer — banked MSP / PSP (and on v8-M, S/NS variants).
R14 (LR)
Link register. On exception: holds EXC_RETURN magic value.
R15 (PC)
Program counter. Always even (Thumb bit live in instruction address LSB).

AAPCS calling convention

  • Args in R0–R3, then stack.
  • Return value in R0 (or R0:R1 for 64-bit).
  • R4–R11, LR preserved across calls.
  • Stack 8-byte aligned at public interfaces.
The auto-stacked set {R0-R3, R12, LR, PC, xPSR} = exactly the caller-saved set + return state. Hardware is effectively doing the function-prologue save for the ISR.
13

xPSR — The Program Status Register

313029 2827 26..25 24 23..20 19..16 15..10 9..8 7..0 N Z C V Q IT[1:0] T GE[3:0] IT[7:2] rsvd ISR # APSR Negative / Zero / Carry / oVerflow / Q-sticky (saturation) EPSR IT[7:0] = next-up to 4 cond-codes · T = Thumb state (always 1) APSR.GE (DSP SIMD flags) IPSR Active ISR number (0 in Thread mode)
Three logical views of one physical register: APSR (flags), EPSR (execution state), IPSR (active exception number). Separate MRS/MSR instructions read/write each subset. On exception entry, the whole xPSR is pushed as the 8th auto-stacked word.
14

Special Registers

RegWidthPurposeAccess
PRIMASK1 bit Master IRQ disable. cpsid i sets; cpsie i clears. Blocks all exceptions except NMI and HardFault. Privileged
FAULTMASK1 bit Like PRIMASK but also blocks HardFault. Auto-cleared on return from handler. Privileged · not on v6-M
BASEPRI8 bit Block all IRQs of numerical priority ≥ BASEPRI (lower number = higher priority). Finer-grained than PRIMASK. Privileged · not on v6-M
CONTROL3 bit Bit 0 (nPRIV): Thread mode privilege; 1 = unprivileged.
Bit 1 (SPSEL): stack in use; 0 = MSP, 1 = PSP (in Thread mode).
Bit 2 (FPCA): FPU context active — controls lazy stacking.
Privileged (writing)
MSPLIM / PSPLIM32 bit Stack-pointer lower limit. Hardware UsageFault on stack descent past limit. Privileged · v8-M Mainline only
Interview tell: a candidate who instinctively reaches for BASEPRI (not PRIMASK) to build a critical section shows they understand priority-preemption — low-priority FreeRTOS kernel locks don't need to block NMI or a high-priority motor-control ISR.
15

Operating State Matrix

 
Privileged
Unprivileged
Thread mode
Privileged Thread
Boot state · kernel code · RTOS idle task · full SCS access
Unprivileged Thread
User tasks · no SCB/NVIC writes · MPU enforces memory isolation
Handler mode
Handler (always Privileged)
Every exception / interrupt runs here · always on MSP · IPSR.ISR # ≠ 0

Transitions — to Handler

  • Any exception (IRQ, fault, SVC, PendSV, SysTick).
  • Hardware auto-stacks 8 registers, switches SP to MSP, IPSR ← exception #.

Transitions — back to Thread

  • Handler returns by loading EXC_RETURN value into PC (via BX LR, POP {PC}, etc.).
  • Magic bits in EXC_RETURN choose MSP vs PSP on return and restore the prior CONTROL.nPRIV.
16

Dual Stacks — MSP & PSP

MSP Main Stack Pointer Used by • all handlers • RTOS kernel • boot & idle SP at reset loads from vector[0] PSP Process Stack Pointer Used by • each task • in Thread mode when CONTROL.SPSEL=1 set up by the RTOS per-task

Why two stacks?

  • MPU can restrict user-task stacks to their own PSP region without blocking the kernel's MSP.
  • A rogue task that blows its stack cannot corrupt kernel state above — at worst it hits MemManage on its own PSPLIM.
  • Exception always stacks on the currently-active SP, then switches to MSP for the handler. On return, CPU restores the previous SP selection from EXC_RETURN.
; typical RTOS switch to Unpriv Thread w/ PSP
  LDR   r0, =task_stack_top
  MSR   PSP, r0
  MOVS  r0, #0b011       ; SPSEL=1, nPRIV=1
  MSR   CONTROL, r0
  ISB
  BX    lr               ; return to task
17

Exception Stack Frame

Basic frame (8 words) xPSR +0x1C PC (ret)+0x18 LR +0x14 R12 +0x10 R3 +0x0C R2 +0x08 R1 +0x04 R0 +0x00 <- SP Extended (FP active) FPSCR S15 ... S0 xPSR (as left) PC / LR / R12 R3 / R2 / R1 / R0 26 words total (18 FP + 8 basic) Lazy: space reserved, regs saved on first FP op.
  • Hardware pushes exactly the caller-saved set. The handler may compile as an ordinary C function with no prologue tricks.
  • Stack is aligned to 8 bytes (AAPCS) — CPU inserts padding and flags it in bit 9 of the stacked xPSR.
  • If FPU is enabled and context has been touched (CONTROL.FPCA=1), an extended frame of 26 words is used.
  • Lazy stacking (v7E-M, v8-M): space is reserved but S0-S15/FPSCR are not actually written until the handler itself executes an FP instruction. Typical 17-cycle saving.
18

EXC_RETURN — The Magic LR

On exception entry, LR is loaded with a value where bits [31:4] = 0xFFFFFFF. Bits [3:0] encode how to return:

; v7-M EXC_RETURN values

  0xFFFFFFF1   Handler → Handler   MSP   Basic
  0xFFFFFFF9   Handler → Thread    MSP   Basic
  0xFFFFFFFD   Handler → Thread    PSP   Basic

  0xFFFFFFE1   Handler → Handler   MSP   Extended (FP)
  0xFFFFFFE9   Handler → Thread    MSP   Extended
  0xFFFFFFED   Handler → Thread    PSP   Extended

Executing BX lr with any of these forces an exception-return sequence: unstack the frame, restore xPSR / IPSR, resume.

v8-M extensions

  • Extra bit [6] distinguishes Secure vs Non-Secure exception return.
  • Extra bit [0] (S) and bit [5] (DCRS) control whether integrity signature / additional context was stacked during a cross-domain exception.
  • Attempting BX lr with an invalid EXC_RETURN in Thread mode → UsageFault.
Common bug: a handler written in pure assembler clobbers LR and forgets to reload EXC_RETURN before BX lr. Result: random jump, HardFault, or worse.
19

Pipeline Comparison

CoreStagesPipelineBranch predictionIssue width
M0 / M3 / M43Fetch — Decode — ExecuteStatic (predict not-taken for backward only on M3/M4)1 (in-order)
M0+2Fetch/Decode — ExecuteNone (single-cycle branch target)1
M232Fetch — Decode/ExecuteStatic1
M33 / M52 / M553 – 4F — D — E (— WB on M55)Static1
M76F1 F2 D1 D2 EX WBDynamic (BHT, BTB)Dual-issue (in-order)
M857F1 F2 D1 D2 I EX WBDynamic (BHT, BTB, RAS)Dual-issue (in-order)
Even the deepest Cortex-M pipelines are in-order. There is no register renaming, no out-of-order execution, no speculation past branches without rewinding. This is a deliberate design choice: deterministic WCET matters more than peak IPC for the target workloads.
20

Endianness & Alignment

Endianness

  • Cortex-M supports either little- or big-endian — chosen at reset by a strap pin (BIGEND).
  • Every shipping silicon implementation is little-endian. Big-endian exists in the architecture but you will not see it in the wild.
  • Data endianness only — instructions are always stored little-endian by the linker.
  • REV, REV16, REVSH, RBIT instructions for byte/bit swaps.

Alignment

  • v7-M / v7E-M / v8-M: unaligned word & half-word loads/stores supported in Normal memory (split into aligned beats, several cycles).
  • v6-M: unaligned access always faults (UsageFault on M3/M4 too if CCR.UNALIGN_TRP=1).
  • Device & Strongly-Ordered memory: unaligned always faults.
  • Stack must be 8-byte aligned at exception entry; hardware enforces.
21

Semaphore Primitives — LDREX / STREX

; atomic increment of *p
try:
    LDREX  r1, [r0]      ; tag exclusive
    ADDS   r1, r1, #1
    STREX  r2, r1, [r0]  ; r2=0 ok, 1 fail
    CMP    r2, #0
    BNE    try
    DMB
  • Single-core architectural primitive — maps to a local monitor in each Cortex-M.
  • Any exception, context switch, or write to the tagged address clears the monitor → retry.
  • Used by FreeRTOS (on M3+) and C11 atomics.
  • v8-M adds byte/half-byte variants (LDREXB, LDREXH) and clearer ordering rules.
  • M0 / M0+ / M23 lack LDREX/STREX — they use cpsid i / PRIMASK critical sections instead.
22

Bus Interfaces

CoreBusNotes
M0 / M0+ / M1AHB-LiteSingle 32-bit master bus; separate PPB for CoreSight/NVIC
M3 / M4AHB-Lite (I-Code / D-Code / System bus + PPB)Code-space traffic on I-Code & D-Code; everything else on System bus → SoC can arbitrate independently
M7AXI-M + AHB peripheral + TCM + ITCM/DTCMAXI for high-bandwidth external memory; TCM for deterministic code/data
M23 / M33 / M35PAHB-5 (TrustZone-aware)Security attributes (HNONSEC) on every transaction
M52 / M55 / M85AHB-5 + optional AXI + TCM + Co-processor (ACI) busACI exposes dedicated instruction opcodes to an attached accelerator
Harvard-style split (I-bus / D-bus) lets the CPU fetch instructions and operate on data in parallel — critical for hitting the 1 DMIPS/MHz target on a 3-stage pipeline. The bus matrix in the SoC ultimately collapses them onto a unified memory, but the core sees separate paths.
23

Reset Behaviour

What the CPU does out of reset

  1. Read 32-bit word at address 0x00000000 → load into MSP.
  2. Read 32-bit word at address 0x00000004 → load into PC (reset vector).
  3. Enter Privileged Thread mode, MSP selected, T-bit = 1.
  4. All NVIC IRQs disabled, VTOR = 0, SCB/MPU unconfigured.

Typical startup code

Reset_Handler:
  LDR    r0, =_sidata    ; .data init
  LDR    r1, =_sdata
  LDR    r2, =_edata
copy: CMP r1, r2
  ITT    LT
  LDRLT  r3, [r0], #4
  STRLT  r3, [r1], #4
  BLT    copy

  LDR    r0, =_sbss      ; .bss zero
  LDR    r1, =_ebss
  MOVS   r2, #0
zero: CMP r0, r1
  ITT    LT
  STRLT  r2, [r0], #4
  BLT    zero

  BL     SystemInit      ; PLLs, caches, VTOR
  BL     __libc_init_array
  BL     main
24

Vendor Variation You Will See

ST — STM32

Largest Cortex-M portfolio. F0 (M0), F1/F3/F4 (M3/M4), F7/H7 (M7), L5/U5 (M33), H5 (M33), U0 (M0+).

Vendor HAL + LL layers on top of CMSIS.

NXP

LPC5500 (M33+M33), i.MX RT (M7 @ 1 GHz "crossover MCU"), Kinetis (M0+/M4), S32K (automotive M4/M7).

Secure wake-up M33 often paired with a bigger A-profile.

Nordic · Silicon Labs · Renesas

nRF52 (M4), nRF53 (dual M33), nRF54 (M33); EFR32 wireless (M33); Renesas RA (M33/M85).

Strong wireless-stack + TrustZone story.

Microchip · Infineon · TI

SAM family (M0+/M4/M7/M23); PSoC 6 (M4+M0+); TM4C / MSP432 (M4/M4F).

Ambiq · Alif

Ambiq Apollo (M4F/M55, subthreshold, < 6 µA/MHz). Alif Ensemble (M55 + M55 + NPU).

Edge-ML showpieces.

Raspberry Pi

RP2040 (dual M0+), RP2350 (dual M33 or dual Hazard-3 RISC-V — first official multi-arch MCU).

25

CMSIS — The Portable Layer

  • CMSIS-CORE — C headers for registers, intrinsics (__WFI, __DMB), NVIC/SCB/MPU inline functions. Every Cortex-M vendor ships this.
  • CMSIS-DSP — fixed/float DSP kernels; uses DSP & FPU instructions when available, scalar fallback when not.
  • CMSIS-NN — int8/int16 NN primitives optimised for M4 DSP and M55/M85 Helium.
  • CMSIS-RTOS v2 — thin API spec; FreeRTOS/Zephyr/RTX implement it.
  • CMSIS-Pack — device description files, used by Keil µVision, VS Code embedded tools, Arm Clang.
/* Portable: same on any Cortex-M */
#include "cmsis_compiler.h"

void enter_critical(void)
{
    __disable_irq();          /* cpsid i */
    __DMB();
}

uint32_t atomic_load(uint32_t *p)
{
    uint32_t v = __LDREXW(p);
    __CLREX();                /* drop monitor */
    return v;
}
A huge reason Cortex-M dominates: driver code written in 2009 against CMSIS-CORE still compiles and runs on a 2024 Cortex-M85.
26

Performance Snapshot

CoreTypical fmaxCoreMark/MHzDMIPS/MHzTypical silicon (65/40/22 nm)
M0 ~50 MHz 2.33 0.9 < 12k gates
M0+ ~50 MHz 2.46 0.95 < 12k gates
M3 ~100 MHz 3.32 1.25 ~30k gates
M4 ~120 MHz 3.40 1.25 ~35k (no FPU) / ~55k (with FPU)
M7600 MHz5.012.14~300k gates (+ caches)
M33 ~160 MHz 4.02 1.5 ~45k gates
M55 ~400 MHz 4.35 1.7 ~300k (with Helium)
M85~700 MHz6.284.0> 600k gates (with options)

Numbers from Arm's published figures (2024/2025). Actual silicon depends heavily on library, process, and whether the vendor flushed timing to meet fmax or area.

27

Choosing a Cortex-M — Decision Flow

Start: pick a Cortex-M Security isolation needed? v8-M: M23 / M33 / M35P / M52 / M55 / M85 yes Peak perf needed? no M7 (600 MHz) or M85 (700 MHz + MVE) yes DSP / ML workload? M4 (DSP) · M7 · M55 / M85 (MVE) yes Area / power critical no M0+ (no sec) · M23 (w/ TrustZone) Default: M4 or M33 (mid-range sweet spot)
28

References & Further Reading

ArmArmv6-M / Armv7-M / Armv8-M Architecture Reference Manuals (ARM DDI 0419, 0403, 0553)
Arm — Cortex-M0/M0+/M3/M4/M7/M23/M33/M55/M85 Technical Reference Manuals
Joseph YiuThe Definitive Guide to Arm Cortex-M23 & Cortex-M33 Processors (Newnes, 2021)
Joseph YiuThe Definitive Guide to Arm Cortex-M3 and Cortex-M4 Processors, 3rd ed. (Newnes, 2014)
Jonathan ValvanoEmbedded Systems: Real-Time Interfacing to ARM Cortex-M Microcontrollers (2017)
Arm Developerdeveloper.arm.com/documentation — free TRMs, QRCs, CMSIS headers
Arm Community — CMSIS-Core on GitHub (github.com/ARM-software/CMSIS_5)
Wikipedia — "ARM Cortex-M" family article — a surprisingly good cross-reference table

Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use. Code examples provided as-is.