ARM CORTEX-M · PRESENTATION 03

Exceptions, Interrupts & NVIC

Vector Table · Priorities · Tail-Chaining · Late Arrival · Faults
The hardware scheduler inside every Cortex-M
02

What Makes NVIC Special

  • The Nested Vectored Interrupt Controller is part of the core, not an SoC peripheral — same register layout on every Cortex-M.
  • Tightly-coupled to the pipeline: vector fetch and register stacking happen concurrently during exception entry.
  • Up to 240 external IRQs (v6-M: 32; v8-M: up to 480).
  • Pre-emptive, priority-based, deterministic latency.
  • Exception entry & exit sequences implemented in hardware — the handler looks like an ordinary C function.

Compare to 8051 / AVR

Classic MCUs jump to the vector and leave everything else to software — you write a PUSH/POP prologue. The NVIC does this in hardware and adds tail-chaining / late-arrival on top.

Compare to A-profile GIC

A GIC is a separate IP block with its own memory-mapped registers sitting across a bus. Exception entry costs ~hundreds of cycles because of the MMU & bus round-trip. NVIC is in the core and exception entry is 12 cycles on M3.

03

Exception Model Overview

System Exceptions (1–15) 1 Reset (fixed -3) 2 NMI (fixed -2) 3 HardFault (fixed -1) 4 MemManage (prog) 5 BusFault (prog) 6 UsageFault (prog) 7 SecureFault (prog, v8-M) 11 SVCall (prog) 12 DebugMonitor (prog) 14 PendSV (prog) 15 SysTick External IRQs (16 …) Vendor-defined peripheral interrupts: • Timers (TIM, SysTick, RTC) • Comms (UART, SPI, I²C, CAN, USB, Ethernet) • ADC / DAC / comparators • DMA channel done / error • GPIO (EXTI) lines • Radio / crypto engine Max IRQs: v6-M = 32 · v7-M = 240 · v8-M = 480 Each IRQ occupies vector slot (16 + IRQn).
04

System Exceptions — The Full List

#NamePriorityPurposeAvailable on
1Reset-3 (fixed, highest)Power-on, watchdog, SW reset via AIRCR.SYSRESETREQAll
2NMI-2 (fixed)Non-maskable — clock failure, tamper, SoC-routed critical inputAll
3HardFault-1 (fixed)Last-resort fault. Escalation target when another fault is disabled or nested.All
4MemManageprogMPU violation, XN (execute-never) violationv7-M, v8-M Main
5BusFaultprogBus error (abort from memory system)v7-M, v8-M Main
6UsageFaultprogUndefined instr, unaligned, /0, stack limit hit, invalid EXC_RETURNv7-M, v8-M Main
7SecureFaultprogTrustZone violation (NS access to S resource etc.)v8-M only, Secure world
11SVCallprogSupervisor call — SVC #imm. RTOS syscall trampoline.All
12DebugMonitorprogDebug events when halting debug is disabledv7-M, v8-M
14PendSVprogSoftware-pended exception — the classic RTOS context-switch vectorAll
15SysTickprog24-bit down-counter interrupt — the standard OS tickMostly all (optional on M1)
05

Vector Table Layout

Offset    Word
0x0000    Initial MSP (loaded into SP at reset)
0x0004    Reset_Handler
0x0008    NMI_Handler
0x000C    HardFault_Handler
0x0010    MemManage_Handler
0x0014    BusFault_Handler
0x0018    UsageFault_Handler
0x001C    SecureFault_Handler  (v8-M)
0x0020    (reserved)
0x0024    (reserved)
0x0028    (reserved)
0x002C    SVC_Handler
0x0030    DebugMon_Handler
0x0034    (reserved)
0x0038    PendSV_Handler
0x003C    SysTick_Handler
0x0040    IRQ0_Handler         ; vector 16
0x0044    IRQ1_Handler
...       ...
  • Table lives at the address held in VTOR (SCB->VTOR, at 0xE000_ED08).
  • At reset VTOR = 0, so the table must be at address 0 or the SoC remaps flash/ROM there.
  • Relocating to SRAM is a common trick: copy table, update VTOR, enable one ISR at a time.
  • VTOR must be aligned to the next power-of-two above (num_vectors × 4) — typically 128 or 256 bytes minimum, larger for big tables.
  • v8-M TrustZone: two VTORsVTOR_S (privileged-Secure) and VTOR_NS (Non-Secure).
Forgetting the __DSB();__ISB(); after updating VTOR → instructions in flight may still use the old table.
06

Vector Table in C

/* Classic CMSIS-style vector table for STM32F4 */
extern uint32_t _estack;            /* provided by linker */

void Reset_Handler(void);
void Default_Handler(void);
void NMI_Handler(void)          __attribute__((weak, alias("Default_Handler")));
void HardFault_Handler(void)    __attribute__((weak, alias("Default_Handler")));
void SysTick_Handler(void)      __attribute__((weak, alias("Default_Handler")));
void EXTI0_IRQHandler(void)     __attribute__((weak, alias("Default_Handler")));
/* … one per peripheral … */

__attribute__((section(".isr_vector"), used))
const void (* const g_vector_table[])(void) = {
    (void (*)(void)) &_estack,       /* 0x000  initial MSP */
    Reset_Handler,                   /* 0x004 */
    NMI_Handler,                     /* 0x008 */
    HardFault_Handler,               /* 0x00C */
    /* system vectors … */
    SysTick_Handler,                 /* 0x03C */
    EXTI0_IRQHandler,                /* 0x040 — IRQ0 */
    /* peripherals … */
};
The weak / alias pattern lets unused handlers collapse to Default_Handler, but a user who defines EXTI0_IRQHandler shadows the weak symbol without any registration API. Clean, zero-overhead.
07

Priority Model

  • Lower numerical value = higher priority (counter-intuitive — source of many bugs).
  • System exceptions Reset/NMI/HardFault have fixed negative priorities and cannot be reordered.
  • All other exceptions (incl. configurable faults, PendSV, SysTick, external IRQs) are programmable.
  • Priority field is architecturally 8 bits but only the top N bits are implemented.

How many bits?

CorePriority bitsLevels
M0, M0+, M2324
M11 or 22 – 4
M3, M4, M73–8 (impl. choice)8 – 256
M33, M55, M853–8 (impl. choice)8 – 256

Most ST/NXP STM32/LPC parts implement 4 bits → 16 levels. Nordic nRF52 implements 3 bits → 8 levels.

08

Why Priority Bits Live in the Upper Half

Full 8-bit field
7
6
5
4
3
2
1
0
4-bit impl. (ST)
P
P
P
P
3-bit impl. (Nordic)
P
P
P
  • Implementations that add more bits keep existing software working: a 0x20 priority value retains the same relative ordering across vendors.
  • Writing to unimplemented low bits is harmless — they read back as 0.
  • Gotcha: a value of 0x01 on a 4-bit-impl MCU evaluates to priority 0 (identical to 0x0F's visible bits masked). Always set priorities via NVIC_SetPriority(), which handles the shift.
09

Priority Grouping — Preempt vs Sub-Priority

Each priority field is split into preempt priority (upper) and sub-priority (lower) by AIRCR.PRIGROUP.

PRIGROUP=0
P
P
P
P
PRIGROUP=4
P
P
P
S
PRIGROUP=7
S
S
S
S

Preempt = can I interrupt another ISR?
Sub = who runs first if multiple are simultaneously pending at the same preempt level?

Practical advice

  • On FreeRTOS, configPRIO_BITS is the implemented priority-bit count; FreeRTOS demands all preempt (PRIGROUP = 0). Sub-priority complicates reasoning — pick one.
  • For kernel-aware interrupts (that may call xQueueSendFromISR), priority must be numerically ≥ configMAX_SYSCALL_INTERRUPT_PRIORITY.
  • Higher-priority (smaller number) ISRs may run "above the kernel" and must not touch RTOS objects.
10

Exception Entry Sequence (M3/M4)

cycle
1
2
3
4
5
6
7
8
9
10
11
12
pipeline
Thr
Thr
Stk
Stk
Stk
Stk
Stk
Stk
Stk
Stk
ISR1
ISR2
bus D
W xPSR
W PC
W LR
W R12
W R3
W R2
W R1
W R0
bus I
F thr
F thr
F vec
F vec
F ISR0
F ISR1
F ISR2

Entry (12 cycles)

  • Stacking of 8 words overlapped with vector fetch.
  • Load PC from vector → start fetching ISR.
  • On M7 (dual-issue + cache), typical entry is 10 cycles best-case from a hit.

Exit (12 cycles)

  • BX lr with EXC_RETURN pattern triggers unstack.
  • 8 words popped; IPSR & control restored; instructions of pre-empted code resume.
  • Unless a new exception is pending at equal-or-higher priority → tail-chain (next slide).
11

Tail-Chaining

Without tail-chaining (hypothetical)

phase
Thr
Stk (8w)
ISR1
Unstk
Stk
ISR2
Unstk
Thr

Two 12-cycle stacks + two 12-cycle unstacks = 48 cycles of pure overhead.

With tail-chaining (actual)

phase
Thr
Stk (8w)
ISR1
Chain (6 cyc)
ISR2
Unstk
Thr

Frame stays on stack; only 6 cycles to fetch new vector & enter ISR2. Saves ~20 cycles per chained pair.

Tail-chaining triggers whenever the NVIC sees a pending exception at or above the current handler's priority at the moment of return. The CPU simply re-uses the existing stack frame and does a new vector fetch.
12

Late-Arrival Preemption

Scenario

A low-priority IRQ fires → CPU starts stacking. Midway through, a higher-priority IRQ fires.

cycles
1
2
3
4
5
6
7
8
9
normal
Thr
Stk
Stk
Stk
Stk
Stk
Stk
Stk
LO
late arr.
Thr
Stk
Stk
!HI pending
Stk
Stk
Stk
Stk
HI
  • The NVIC monitors pending IRQs during stacking.
  • If a higher-priority exception arrives, CPU completes the in-flight stacking (frame is valid) and redirects the vector fetch to the new target.
  • The lower-priority exception becomes pending → eventually tail-chains after the higher one.
Net effect: worst-case interrupt latency for a high-priority ISR is bounded by the stack time of any in-flight entry — not by the low-priority ISR executing to completion.
13

Pop-Preemption

During exit unstacking…

cycles
1
2
3
4
5
6
state
HI
Pop
Pop
!HI2
abort pop
HI2
  • If another IRQ preempts while the CPU is in the middle of unstacking, pop is aborted.
  • The current frame is reused — no re-push — and the new ISR vector is fetched.
  • Requires the frame to still be valid on the current stack; CPU guarantees that by only aborting in the early pop cycles.
Tail-chaining, late arrival, and pop-preemption together give Cortex-M a worst-case interrupt latency dominated by the 8-word stacking, not by handler-chain unwinding. This is the core reason the family dominates hard-real-time niches.
14

Lazy Floating-Point Stacking

Without lazy stacking

  • On exception entry, if FPU is active, CPU always pushes 18 extra FP words (S0–S15, FPSCR, padding).
  • ~17 extra cycles on M4, 20+ on M7.
  • Wasted if the ISR does not touch the FPU.

With lazy stacking

  • CPU reserves space (advances SP by 18 words) but skips the writes.
  • Sets FPCCR.LSPACT = 1.
  • If the ISR ever executes an FP instruction → hardware detects this, performs the deferred save, then continues.
  • If it never does → no FP registers written. Full save / restore avoided.
/* Enable lazy stacking (default on M4F with FPU on) */
#define FPCCR   (*(volatile uint32_t*)0xE000EF34)
#define FPCCR_ASPEN  (1U << 31)
#define FPCCR_LSPEN  (1U << 30)

/* Most startup code sets both:
   ASPEN = automatic FP state preservation on exception
   LSPEN = lazy — defer until the handler uses FPU */
FPCCR |= FPCCR_ASPEN | FPCCR_LSPEN;
Trap: if a non-task context (e.g. hard fault handler) executes an FP instruction with the frame partially written, and the frame pointer is wrong, you get chaos. In fault handlers: disable FPU (or never touch it).
15

Wake-up Interrupt Controller (WIC)

Core + NVIC clock gated in deep sleep WIC always clocked by 32 kHz / LF clock IRQ sources (UART RX, GPIO, RTC tick…) wake
  • WIC is a small always-on block that duplicates the NVIC's mask logic.
  • In deep sleep, core & NVIC are clock-gated (M7+) or power-gated (state-retentive).
  • When a masked-in IRQ asserts → WIC asserts WAKEUP to the power-management IP → clocks restore → NVIC takes the interrupt normally.
  • Entry latency grows by clock-start time (typ. a few µs to ~100 µs depending on LDO/PLL).

Needed because without the WIC, the NVIC cannot sense the IRQ line while its clock is off. With the WIC, sub-µA deep sleep with IRQ wake is possible.

16

SVCall vs PendSV — Two Roles

SVCall synchronous

  • Triggered by the SVC #imm instruction in user code.
  • Runs at configured priority (typically high) — returns via unstack as normal.
  • Classic use: syscall trampoline — user task requests kernel action; handler reads the imm byte to dispatch.
  • On v8-M, SVC from Non-Secure traps into NS handler; S-side has its own.

PendSV software-pended

  • Set by writing SCB->ICSR.PENDSVSET.
  • Runs at its own priority slot; typically configured as the lowest priority.
  • Used by RTOS kernels for the context switch — PendSV is guaranteed to run after all higher-priority IRQs finish, so the switch can never pre-empt a critical ISR.
  • Classic FreeRTOS pattern: SysTick or kernel syscall pends PendSV → PendSV_Handler performs the switch.
17

SysTick — The Built-in Tick Timer

  • 24-bit down-counter in the SCS. Counts at core clock or external (impl.-defined) clock.
  • Fires SysTick exception (#15) on reload.
  • Register set:
    • SYST_CSR — enable, tickint, clksource, countflag.
    • SYST_RVR — reload value.
    • SYST_CVR — current.
    • SYST_CALIB — implementation-provided 10 ms calibration constant.
/* 1 kHz tick on a 168 MHz M4 */
#define SYST_CSR  (*(volatile uint32_t*)0xE000E010)
#define SYST_RVR  (*(volatile uint32_t*)0xE000E014)
#define SYST_CVR  (*(volatile uint32_t*)0xE000E018)

void systick_init(uint32_t core_hz, uint32_t tick_hz)
{
    SYST_RVR = core_hz / tick_hz - 1; /* 167 999 */
    SYST_CVR = 0;                     /* clear */
    SYST_CSR = (1U<<2) |   /* CLKSOURCE = core */
               (1U<<1) |   /* TICKINT */
               (1U<<0);    /* ENABLE */
}

A single 24-bit counter means max interval at 168 MHz ≈ 99.9 ms. Tickless RTOS designs pair SysTick with the RTC for longer sleeps.

18

Fault Taxonomy

HardFault catch-all / escalation MemManage MPU / XN violation BusFault bus error abort UsageFault undefined / div0 / align… SecureFault v8-M TrustZone only escalate if disabled / nested Configurable fault → HardFault if (a) not enabled in SHCSR, (b) same-/higher-priority fault nests, (c) fault in NMI/HardFault itself, (d) vector fetch failure
Enable SCB->SHCSR.{MEMFAULTENA, BUSFAULTENA, USGFAULTENA} at boot so specific faults have distinct handlers. Otherwise every fault looks like HardFault, which is harder to debug.
19

Decoding a HardFault

Where the evidence lives

  • SCB->HFSR — hard-fault status. FORCED=1 means an escalated configurable fault.
  • SCB->CFSR — configurable fault status register. Three bytes: MMFSR / BFSR / UFSR.
  • SCB->MMFAR — fault address if MMARVALID.
  • SCB->BFAR — bus-fault address if BFARVALID.
  • Auto-stacked frame holds the faulting PC, LR, xPSR.
__attribute__((naked))
void HardFault_Handler(void)
{
    __asm volatile (
        "TST lr, #4          \n"
        "ITE EQ              \n"
        "MRSEQ r0, MSP       \n"
        "MRSNE r0, PSP       \n"
        "B hard_fault_c      \n");
}

void hard_fault_c(uint32_t *sf)
{
    uint32_t pc   = sf[6];
    uint32_t lr   = sf[5];
    uint32_t psr  = sf[7];
    uint32_t cfsr = SCB->CFSR;
    /* write to RTT / flash / UART, then NVIC_SystemReset */
}

Pattern: find the stack used at fault (MSP/PSP via bit 2 of LR/EXC_RETURN), grab stacked PC + CFSR, log it, reset.

20

Common Fault Causes — Recipes

SymptomCFSR bitTypical cause
UsageFault — UNDEFINSTRUFSR.UNDEFINSTRJumped into data; bit-rot flash; wrong Thumb bit in function pointer.
UsageFault — INVPCUFSR.INVPCEXC_RETURN corrupted (stack smash, bad handler asm).
UsageFault — UNALIGNEDUFSR.UNALIGNEDCCR.UNALIGN_TRP=1; unaligned word/halfword on v6-M or to Device memory.
UsageFault — DIVBYZEROUFSR.DIVBYZEROCCR.DIV_0_TRP=1; signed/unsigned divide by 0.
BusFault — PRECISERRBFSR.PRECISERRIllegal peripheral address; access to region with no mapped slave.
BusFault — IMPRECISERRBFSR.IMPRECISERRWrite-buffered error — fault address is not in BFAR; force DSB after writes or disable write buffer while debugging.
MemManage — DACCVIOL / IACCVIOLMMFSR.DACC/IACCMPU denied the access — check permissions & region config.
Stack-limit hitUFSR.STKOF (v8-M)SP went below PSPLIM/MSPLIM — add overflow check via CMSIS.
21

CMSIS NVIC API

/* Enable a peripheral IRQ at priority 0x50 */
NVIC_SetPriority(EXTI0_IRQn, 0x50);
NVIC_EnableIRQ(EXTI0_IRQn);

/* Pend in software */
NVIC_SetPendingIRQ(EXTI0_IRQn);

/* Get active ISR# (nonzero in Handler mode) */
uint32_t vect = __get_IPSR();

/* Atomic critical section for BASEPRI-filterable IRQs */
uint32_t prev = __get_BASEPRI();
__set_BASEPRI(0x20);
/* … */
__set_BASEPRI(prev);

Under the hood

  • NVIC_EnableIRQ(n) → sets bit n%32 in NVIC->ISER[n/32].
  • NVIC_SetPriority(n, p) → writes the upper __NVIC_PRIO_BITS bits of p to NVIC->IPR[n].
  • Faults / PendSV / SysTick use SCB->SHPR[1..3] instead of IPR.
  • CMSIS macros on v8-M automatically target the Secure or NS register bank based on the current state — no source changes needed when moving code S↔NS.
22

ISR Design Patterns

Top-half only

ISR does the whole job — short, deterministic, no RTOS calls. Fits GPIO debouncing, simple UART byte echo, timer tick that just sets a flag.

Top + bottom half

ISR captures hardware state into a lock-free queue / ring buffer, then returns fast. Worker thread or deferred procedure handles the slow part. Use xQueueSendFromISR/xTaskNotifyFromISR with the pxHigherPriorityTaskWoken pattern.

Tickless / event-driven

Two priority bands — hard RT IRQs above configMAX_SYSCALL_INTERRUPT_PRIORITY must not touch the RTOS; soft RT IRQs below it may signal tasks.

Golden rule: an ISR should return in tens of microseconds at most. Anything longer steals determinism from every lower-priority source. If the work is unbounded, use DMA + completion IRQ, or defer.
23

FreeRTOS — Exception Priority Rules

0 (highest) 255 (lowest) Above kernel IRQs may NOT call FromISR APIs can preempt the scheduler e.g. motor-control 20 kHz PWM ISR Kernel-safe IRQs may call xQueueSendFromISR etc. blocked by taskENTER_CRITICAL e.g. UART, DMA completion PendSV / SysTick lowest priority kernel own use configMAX_SYSCALL_INTERRUPT_PRIORITY

The exact numerical threshold is configured per port. With 4 impl. bits on STM32, a typical split is kernel-safe above 0x50 and PendSV/SysTick at 0xF0.

24

Common Bugs

1. Priority inversion via PRIMASK

FreeRTOS taskENTER_CRITICAL() sets BASEPRI — not PRIMASK — so high-priority "above-kernel" IRQs can still fire. Calling __disable_irq() in user code defeats this.

2. Setting NVIC priority before clearing pending

Changing priority of an ISR that is already pending has implementation-defined timing. Always disable IRQ, clear pending, set priority, then enable.

3. Calling FromISR on a too-high-priority IRQ

Silent corruption of kernel state; often manifests as a random HardFault hours later. Assert priority in the ISR.

4. Forgetting to clear peripheral pending bit

NVIC pending bit clears on entry, but the peripheral flag still asserts → re-entry forever. Always acknowledge in the peripheral's status register.

5. Write-buffered imprecise BusFault

Fault arrives cycles after the offending store — PC in the stacked frame is not where the bug is. DSB after suspect writes, or disable write-buffer via ACTLR.DISDEFWBUF during debug.

6. FPU touched in a fault handler with lazy stacking on

Partial frame + float op = spectacular crash. Mark fault handlers __attribute__((target("no-float"))) or disable FPU.

25

v8-M / TrustZone Additions

  • Two vector tables: VTOR_S (Secure) and VTOR_NS (Non-Secure). Every exception is either S or NS depending on target state.
  • Each fault type can be banked — S and NS each has its own MemManage/Bus/Usage handler.
  • When an exception crosses domains, hardware stacks additional context (DCRS bit in EXC_RETURN).
  • NMI and HardFault can be configured to always target Secure (AIRCR.BFHFNMINS).
  • Stacking of additional state + an integrity signature prevents NS code from forging a return into S.
  • Priority register banks: NS code cannot raise its priority above the NS-Prio boundary set by S at boot.
  • SecureFault (exception #7) is dedicated to TZ violations — separate from MemManage/Bus/Usage.
Covered in depth in Presentation 07 — TrustZone.
26

References

ArmArmv7-M / Armv8-M Architecture Reference Manual — section B1 (Exception model)
ArmCortex-M3 / M4 / M7 Devices Generic User Guide (DUI0553, DUI0552, DUI0646)
Joseph YiuDefinitive Guide to Cortex-M3/M4 — chapters 7–9
Joseph YiuCortex-M23/M33 Definitive Guide — chapters 10–12 for TrustZone exception entry
FreeRTOS — Cortex-M interrupt priority behaviour (freertos.org → "Cortex-M interrupt priorities")
ARM Community blog — "Cutting through the confusion with Cortex-M interrupt priorities" by Joseph Yiu
Segger — "Cortex-M Fault Analysis" application note

Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.