ARM CORTEX-M · PRESENTATION 03

Exceptions, Interrupts & NVIC

Vector Table · Priorities · Tail-Chaining · Late Arrival · Faults

The hardware scheduler inside every Cortex-M

What Makes NVIC Special

The Nested Vectored Interrupt Controller is part of the core, not an SoC peripheral — same register layout on every Cortex-M.
Tightly-coupled to the pipeline: vector fetch and register stacking happen concurrently during exception entry.
Up to 240 external IRQs (v6-M: 32; v8-M: up to 480).
Pre-emptive, priority-based, deterministic latency.
Exception entry & exit sequences implemented in hardware — the handler looks like an ordinary C function.

Compare to 8051 / AVR

Classic MCUs jump to the vector and leave everything else to software — you write a PUSH/POP prologue. The NVIC does this in hardware and adds tail-chaining / late-arrival on top.

Compare to A-profile GIC

A GIC is a separate IP block with its own memory-mapped registers sitting across a bus. Exception entry costs ~hundreds of cycles because of the MMU & bus round-trip. NVIC is in the core and exception entry is 12 cycles on M3.

Exception Model Overview

System Exceptions — The Full List

#	Name	Priority	Purpose	Available on
1	Reset	-3 (fixed, highest)	Power-on, watchdog, SW reset via AIRCR.SYSRESETREQ	All
2	NMI	-2 (fixed)	Non-maskable — clock failure, tamper, SoC-routed critical input	All
3	HardFault	-1 (fixed)	Last-resort fault. Escalation target when another fault is disabled or nested.	All
4	MemManage	prog	MPU violation, XN (execute-never) violation	v7-M, v8-M Main
5	BusFault	prog	Bus error (abort from memory system)	v7-M, v8-M Main
6	UsageFault	prog	Undefined instr, unaligned, /0, stack limit hit, invalid EXC_RETURN	v7-M, v8-M Main
7	SecureFault	prog	TrustZone violation (NS access to S resource etc.)	v8-M only, Secure world
11	SVCall	prog	Supervisor call — `SVC #imm`. RTOS syscall trampoline.	All
12	DebugMonitor	prog	Debug events when halting debug is disabled	v7-M, v8-M
14	PendSV	prog	Software-pended exception — the classic RTOS context-switch vector	All
15	SysTick	prog	24-bit down-counter interrupt — the standard OS tick	Mostly all (optional on M1)

Vector Table Layout

Offset    Word
0x0000    Initial MSP (loaded into SP at reset)
0x0004    Reset_Handler
0x0008    NMI_Handler
0x000C    HardFault_Handler
0x0010    MemManage_Handler
0x0014    BusFault_Handler
0x0018    UsageFault_Handler
0x001C    SecureFault_Handler  (v8-M)
0x0020    (reserved)
0x0024    (reserved)
0x0028    (reserved)
0x002C    SVC_Handler
0x0030    DebugMon_Handler
0x0034    (reserved)
0x0038    PendSV_Handler
0x003C    SysTick_Handler
0x0040    IRQ0_Handler         ; vector 16
0x0044    IRQ1_Handler
...       ...

Table lives at the address held in VTOR (SCB->VTOR, at 0xE000_ED08).
At reset VTOR = 0, so the table must be at address 0 or the SoC remaps flash/ROM there.
Relocating to SRAM is a common trick: copy table, update VTOR, enable one ISR at a time.
VTOR must be aligned to the next power-of-two above (num_vectors × 4) — typically 128 or 256 bytes minimum, larger for big tables.
v8-M TrustZone: two VTORs — VTOR_S (privileged-Secure) and VTOR_NS (Non-Secure).

Forgetting the __DSB();__ISB(); after updating VTOR → instructions in flight may still use the old table.

Vector Table in C

/* Classic CMSIS-style vector table for STM32F4 */
extern uint32_t _estack;            /* provided by linker */

void Reset_Handler(void);
void Default_Handler(void);
void NMI_Handler(void)          __attribute__((weak, alias("Default_Handler")));
void HardFault_Handler(void)    __attribute__((weak, alias("Default_Handler")));
void SysTick_Handler(void)      __attribute__((weak, alias("Default_Handler")));
void EXTI0_IRQHandler(void)     __attribute__((weak, alias("Default_Handler")));
/* … one per peripheral … */

__attribute__((section(".isr_vector"), used))
const void (* const g_vector_table[])(void) = {
    (void (*)(void)) &_estack,       /* 0x000  initial MSP */
    Reset_Handler,                   /* 0x004 */
    NMI_Handler,                     /* 0x008 */
    HardFault_Handler,               /* 0x00C */
    /* system vectors … */
    SysTick_Handler,                 /* 0x03C */
    EXTI0_IRQHandler,                /* 0x040 — IRQ0 */
    /* peripherals … */
};

The weak / alias pattern lets unused handlers collapse to Default_Handler, but a user who defines EXTI0_IRQHandler shadows the weak symbol without any registration API. Clean, zero-overhead.

Priority Model

Lower numerical value = higher priority (counter-intuitive — source of many bugs).
System exceptions Reset/NMI/HardFault have fixed negative priorities and cannot be reordered.
All other exceptions (incl. configurable faults, PendSV, SysTick, external IRQs) are programmable.
Priority field is architecturally 8 bits but only the top N bits are implemented.

How many bits?

Core	Priority bits	Levels
M0, M0+, M23	2	4
M1	1 or 2	2 – 4
M3, M4, M7	3–8 (impl. choice)	8 – 256
M33, M55, M85	3–8 (impl. choice)	8 – 256

Most ST/NXP STM32/LPC parts implement 4 bits → 16 levels. Nordic nRF52 implements 3 bits → 8 levels.

Why Priority Bits Live in the Upper Half

Full 8-bit field

4-bit impl. (ST)

—

3-bit impl. (Nordic)

—

Implementations that add more bits keep existing software working: a 0x20 priority value retains the same relative ordering across vendors.
Writing to unimplemented low bits is harmless — they read back as 0.
Gotcha: a value of 0x01 on a 4-bit-impl MCU evaluates to priority 0 (identical to 0x0F's visible bits masked). Always set priorities via NVIC_SetPriority(), which handles the shift.

Priority Grouping — Preempt vs Sub-Priority

Each priority field is split into preempt priority (upper) and sub-priority (lower) by AIRCR.PRIGROUP.

PRIGROUP=0

—

PRIGROUP=4

—

PRIGROUP=7

—

Preempt = can I interrupt another ISR?
Sub = who runs first if multiple are simultaneously pending at the same preempt level?

Practical advice

On FreeRTOS, configPRIO_BITS is the implemented priority-bit count; FreeRTOS demands all preempt (PRIGROUP = 0). Sub-priority complicates reasoning — pick one.
For kernel-aware interrupts (that may call xQueueSendFromISR), priority must be numerically ≥ configMAX_SYSCALL_INTERRUPT_PRIORITY.
Higher-priority (smaller number) ISRs may run "above the kernel" and must not touch RTOS objects.

Exception Entry Sequence (M3/M4)

cycle

pipeline

Thr

Stk

ISR1

ISR2

bus D

—

W xPSR

W PC

W LR

W R12

W R3

W R2

W R1

W R0

—

bus I

F thr

F vec

—

F ISR0

F ISR1

F ISR2

Entry (12 cycles)

Stacking of 8 words overlapped with vector fetch.
Load PC from vector → start fetching ISR.
On M7 (dual-issue + cache), typical entry is 10 cycles best-case from a hit.

Exit (12 cycles)

BX lr with EXC_RETURN pattern triggers unstack.
8 words popped; IPSR & control restored; instructions of pre-empted code resume.
Unless a new exception is pending at equal-or-higher priority → tail-chain (next slide).

Tail-Chaining

Without tail-chaining (hypothetical)

phase

Thr

Stk (8w)

ISR1

Unstk

Stk

ISR2

Unstk

Thr

Two 12-cycle stacks + two 12-cycle unstacks = 48 cycles of pure overhead.

With tail-chaining (actual)

phase

Thr

Stk (8w)

ISR1

Chain (6 cyc)

ISR2

Unstk

Thr

Frame stays on stack; only 6 cycles to fetch new vector & enter ISR2. Saves ~20 cycles per chained pair.

Tail-chaining triggers whenever the NVIC sees a pending exception at or above the current handler's priority at the moment of return. The CPU simply re-uses the existing stack frame and does a new vector fetch.

Late-Arrival Preemption

Scenario

A low-priority IRQ fires → CPU starts stacking. Midway through, a higher-priority IRQ fires.

cycles

normal

Thr

Stk

late arr.

Thr

Stk

!HI pending

Stk

The NVIC monitors pending IRQs during stacking.
If a higher-priority exception arrives, CPU completes the in-flight stacking (frame is valid) and redirects the vector fetch to the new target.
The lower-priority exception becomes pending → eventually tail-chains after the higher one.

Net effect: worst-case interrupt latency for a high-priority ISR is bounded by the stack time of any in-flight entry — not by the low-priority ISR executing to completion.

Pop-Preemption

During exit unstacking…

cycles

state

Pop

!HI2

abort pop

HI2

If another IRQ preempts while the CPU is in the middle of unstacking, pop is aborted.
The current frame is reused — no re-push — and the new ISR vector is fetched.
Requires the frame to still be valid on the current stack; CPU guarantees that by only aborting in the early pop cycles.

Tail-chaining, late arrival, and pop-preemption together give Cortex-M a worst-case interrupt latency dominated by the 8-word stacking, not by handler-chain unwinding. This is the core reason the family dominates hard-real-time niches.

Lazy Floating-Point Stacking

Without lazy stacking

On exception entry, if FPU is active, CPU always pushes 18 extra FP words (S0–S15, FPSCR, padding).
~17 extra cycles on M4, 20+ on M7.
Wasted if the ISR does not touch the FPU.

With lazy stacking

CPU reserves space (advances SP by 18 words) but skips the writes.
Sets FPCCR.LSPACT = 1.
If the ISR ever executes an FP instruction → hardware detects this, performs the deferred save, then continues.
If it never does → no FP registers written. Full save / restore avoided.

/* Enable lazy stacking (default on M4F with FPU on) */
#define FPCCR   (*(volatile uint32_t*)0xE000EF34)
#define FPCCR_ASPEN  (1U << 31)
#define FPCCR_LSPEN  (1U << 30)

/* Most startup code sets both:
   ASPEN = automatic FP state preservation on exception
   LSPEN = lazy — defer until the handler uses FPU */
FPCCR |= FPCCR_ASPEN | FPCCR_LSPEN;

Trap: if a non-task context (e.g. hard fault handler) executes an FP instruction with the frame partially written, and the frame pointer is wrong, you get chaos. In fault handlers: disable FPU (or never touch it).

Wake-up Interrupt Controller (WIC)

WIC is a small always-on block that duplicates the NVIC's mask logic.
In deep sleep, core & NVIC are clock-gated (M7+) or power-gated (state-retentive).
When a masked-in IRQ asserts → WIC asserts WAKEUP to the power-management IP → clocks restore → NVIC takes the interrupt normally.
Entry latency grows by clock-start time (typ. a few µs to ~100 µs depending on LDO/PLL).

Needed because without the WIC, the NVIC cannot sense the IRQ line while its clock is off. With the WIC, sub-µA deep sleep with IRQ wake is possible.

SVCall vs PendSV — Two Roles

SVCall synchronous

Triggered by the SVC #imm instruction in user code.
Runs at configured priority (typically high) — returns via unstack as normal.
Classic use: syscall trampoline — user task requests kernel action; handler reads the imm byte to dispatch.
On v8-M, SVC from Non-Secure traps into NS handler; S-side has its own.

PendSV software-pended

Set by writing SCB->ICSR.PENDSVSET.
Runs at its own priority slot; typically configured as the lowest priority.
Used by RTOS kernels for the context switch — PendSV is guaranteed to run after all higher-priority IRQs finish, so the switch can never pre-empt a critical ISR.
Classic FreeRTOS pattern: SysTick or kernel syscall pends PendSV → PendSV_Handler performs the switch.

SysTick — The Built-in Tick Timer

24-bit down-counter in the SCS. Counts at core clock or external (impl.-defined) clock.
Fires SysTick exception (#15) on reload.
Register set:
- SYST_CSR — enable, tickint, clksource, countflag.
- SYST_RVR — reload value.
- SYST_CVR — current.
- SYST_CALIB — implementation-provided 10 ms calibration constant.

/* 1 kHz tick on a 168 MHz M4 */
#define SYST_CSR  (*(volatile uint32_t*)0xE000E010)
#define SYST_RVR  (*(volatile uint32_t*)0xE000E014)
#define SYST_CVR  (*(volatile uint32_t*)0xE000E018)

void systick_init(uint32_t core_hz, uint32_t tick_hz)
{
    SYST_RVR = core_hz / tick_hz - 1; /* 167 999 */
    SYST_CVR = 0;                     /* clear */
    SYST_CSR = (1U<<2) |   /* CLKSOURCE = core */
               (1U<<1) |   /* TICKINT */
               (1U<<0);    /* ENABLE */
}

A single 24-bit counter means max interval at 168 MHz ≈ 99.9 ms. Tickless RTOS designs pair SysTick with the RTC for longer sleeps.

Fault Taxonomy

Enable SCB->SHCSR.{MEMFAULTENA, BUSFAULTENA, USGFAULTENA} at boot so specific faults have distinct handlers. Otherwise every fault looks like HardFault, which is harder to debug.

Decoding a HardFault

Where the evidence lives

SCB->HFSR — hard-fault status. FORCED=1 means an escalated configurable fault.
SCB->CFSR — configurable fault status register. Three bytes: MMFSR / BFSR / UFSR.
SCB->MMFAR — fault address if MMARVALID.
SCB->BFAR — bus-fault address if BFARVALID.
Auto-stacked frame holds the faulting PC, LR, xPSR.

__attribute__((naked))
void HardFault_Handler(void)
{
    __asm volatile (
        "TST lr, #4          \n"
        "ITE EQ              \n"
        "MRSEQ r0, MSP       \n"
        "MRSNE r0, PSP       \n"
        "B hard_fault_c      \n");
}

void hard_fault_c(uint32_t *sf)
{
    uint32_t pc   = sf[6];
    uint32_t lr   = sf[5];
    uint32_t psr  = sf[7];
    uint32_t cfsr = SCB->CFSR;
    /* write to RTT / flash / UART, then NVIC_SystemReset */
}

Pattern: find the stack used at fault (MSP/PSP via bit 2 of LR/EXC_RETURN), grab stacked PC + CFSR, log it, reset.

Common Fault Causes — Recipes

Symptom	CFSR bit	Typical cause
UsageFault — UNDEFINSTR	UFSR.UNDEFINSTR	Jumped into data; bit-rot flash; wrong Thumb bit in function pointer.
UsageFault — INVPC	UFSR.INVPC	EXC_RETURN corrupted (stack smash, bad handler asm).
UsageFault — UNALIGNED	UFSR.UNALIGNED	`CCR.UNALIGN_TRP=1`; unaligned word/halfword on v6-M or to Device memory.
UsageFault — DIVBYZERO	UFSR.DIVBYZERO	`CCR.DIV_0_TRP=1`; signed/unsigned divide by 0.
BusFault — PRECISERR	BFSR.PRECISERR	Illegal peripheral address; access to region with no mapped slave.
BusFault — IMPRECISERR	BFSR.IMPRECISERR	Write-buffered error — fault address is not in BFAR; force DSB after writes or disable write buffer while debugging.
MemManage — DACCVIOL / IACCVIOL	MMFSR.DACC/IACC	MPU denied the access — check permissions & region config.
Stack-limit hit	UFSR.STKOF (v8-M)	SP went below PSPLIM/MSPLIM — add overflow check via CMSIS.

CMSIS NVIC API

/* Enable a peripheral IRQ at priority 0x50 */
NVIC_SetPriority(EXTI0_IRQn, 0x50);
NVIC_EnableIRQ(EXTI0_IRQn);

/* Pend in software */
NVIC_SetPendingIRQ(EXTI0_IRQn);

/* Get active ISR# (nonzero in Handler mode) */
uint32_t vect = __get_IPSR();

/* Atomic critical section for BASEPRI-filterable IRQs */
uint32_t prev = __get_BASEPRI();
__set_BASEPRI(0x20);
/* … */
__set_BASEPRI(prev);

Under the hood

NVIC_EnableIRQ(n) → sets bit n%32 in NVIC->ISER[n/32].
NVIC_SetPriority(n, p) → writes the upper __NVIC_PRIO_BITS bits of p to NVIC->IPR[n].
Faults / PendSV / SysTick use SCB->SHPR[1..3] instead of IPR.
CMSIS macros on v8-M automatically target the Secure or NS register bank based on the current state — no source changes needed when moving code S↔NS.

ISR Design Patterns

Top-half only

ISR does the whole job — short, deterministic, no RTOS calls. Fits GPIO debouncing, simple UART byte echo, timer tick that just sets a flag.

Top + bottom half

ISR captures hardware state into a lock-free queue / ring buffer, then returns fast. Worker thread or deferred procedure handles the slow part. Use xQueueSendFromISR/xTaskNotifyFromISR with the pxHigherPriorityTaskWoken pattern.

Tickless / event-driven

Two priority bands — hard RT IRQs above configMAX_SYSCALL_INTERRUPT_PRIORITY must not touch the RTOS; soft RT IRQs below it may signal tasks.

Golden rule: an ISR should return in tens of microseconds at most. Anything longer steals determinism from every lower-priority source. If the work is unbounded, use DMA + completion IRQ, or defer.

FreeRTOS — Exception Priority Rules

The exact numerical threshold is configured per port. With 4 impl. bits on STM32, a typical split is kernel-safe above 0x50 and PendSV/SysTick at 0xF0.

Common Bugs

1. Priority inversion via PRIMASK

FreeRTOS taskENTER_CRITICAL() sets BASEPRI — not PRIMASK — so high-priority "above-kernel" IRQs can still fire. Calling __disable_irq() in user code defeats this.

2. Setting NVIC priority before clearing pending

Changing priority of an ISR that is already pending has implementation-defined timing. Always disable IRQ, clear pending, set priority, then enable.

3. Calling FromISR on a too-high-priority IRQ

Silent corruption of kernel state; often manifests as a random HardFault hours later. Assert priority in the ISR.

4. Forgetting to clear peripheral pending bit

NVIC pending bit clears on entry, but the peripheral flag still asserts → re-entry forever. Always acknowledge in the peripheral's status register.

5. Write-buffered imprecise BusFault

Fault arrives cycles after the offending store — PC in the stacked frame is not where the bug is. DSB after suspect writes, or disable write-buffer via ACTLR.DISDEFWBUF during debug.

6. FPU touched in a fault handler with lazy stacking on

Partial frame + float op = spectacular crash. Mark fault handlers __attribute__((target("no-float"))) or disable FPU.

v8-M / TrustZone Additions

Two vector tables: VTOR_S (Secure) and VTOR_NS (Non-Secure). Every exception is either S or NS depending on target state.
Each fault type can be banked — S and NS each has its own MemManage/Bus/Usage handler.
When an exception crosses domains, hardware stacks additional context (DCRS bit in EXC_RETURN).
NMI and HardFault can be configured to always target Secure (AIRCR.BFHFNMINS).

Stacking of additional state + an integrity signature prevents NS code from forging a return into S.
Priority register banks: NS code cannot raise its priority above the NS-Prio boundary set by S at boot.
SecureFault (exception #7) is dedicated to TZ violations — separate from MemManage/Bus/Usage.

Covered in depth in Presentation 07 — TrustZone.

References

Arm — Armv7-M / Armv8-M Architecture Reference Manual — section B1 (Exception model)
Arm — Cortex-M3 / M4 / M7 Devices Generic User Guide (DUI0553, DUI0552, DUI0646)
Joseph Yiu — Definitive Guide to Cortex-M3/M4 — chapters 7–9
Joseph Yiu — Cortex-M23/M33 Definitive Guide — chapters 10–12 for TrustZone exception entry
FreeRTOS — Cortex-M interrupt priority behaviour (freertos.org → "Cortex-M interrupt priorities")
ARM Community blog — "Cutting through the confusion with Cortex-M interrupt priorities" by Joseph Yiu
Segger — "Cortex-M Fault Analysis" application note

Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.