0x4000_0000 is Device memory) without device-specific macros.If you port an RTOS port from STM32 to Nordic to NXP, the address of SCB->VTOR is identical. The peripheral offsets change; the core does not.
Attributes like "Normal" vs "Device" memory are typed by the region (default), but the MPU can override them on a region-by-region basis.
Each region is 512 MiB. The PPB is Strongly-Ordered — every access must complete in program order; no speculative fetch, no write-buffer.
0x0000_0000 (boot alias — remapped by OPTR)
0x0800_0000 main flash — STM32
0x1FFF_0000 system memory (DfU bootloader)
0x1FFF_7800 OTP / option bytes
0x1FFF_F800 engineering bytes
Exact map depends entirely on the vendor. The architecture only guarantees reset reads from 0x0000_0000/4.
.data, .bss.0x2000_0000 – 0x200F_FFFF) is the bit-band region on M3/M4 — single-bit atomic access via the alias.MEMORY {
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 1M
SRAM (rwx) : ORIGIN = 0x20000000, LENGTH = 192K
CCM (rw) : ORIGIN = 0x10000000, LENGTH = 64K
}
SECTIONS {
.text : { *(.isr_vector) *(.text*) *(.rodata*) } >FLASH
.data : { *(.data*) } >SRAM AT>FLASH
.bss : { *(.bss*) } >SRAM
.ccm : { *(.ccm*) } >CCM
}
0x4200_0000 (M3/M4 only).Consecutive writes to a peripheral register are guaranteed to reach it in order, but may still be posted by the bus. Reads may be stalled to the peripheral. No coalescing: two str instructions generate two bus transactions.
str that raises a bus fault may not fault precisely — see imprecise bus fault in presentation 02.
| Address | Block | Role |
|---|---|---|
| 0xE000_0000 | ITM | Instrumentation Trace Macrocell (printf over SWO) |
| 0xE000_1000 | DWT | Data Watchpoint & Trace (cycle counter, watchpoints) |
| 0xE000_2000 | FPB | Flash Patch & Breakpoint |
| 0xE000_E000 | SCS | System Control Space — NVIC, SCB, MPU, SysTick, FPU |
| 0xE004_0000 | TPIU | Trace Port Interface Unit |
| 0xE004_1000 | ETM | Embedded Trace Macrocell (optional) |
| 0xE00F_0000 | ROM tables | CoreSight component enumeration |
Every PPB access completes before the next begins. No speculation. No reordering. No write buffer. Good for configuring NVIC / MPU safely but means PPB writes are never fast.
On Cortex-M3 / M4, a bit-band alias region lets each bit of a 32-bit word in bit-band memory be addressed as its own 32-bit word in the alias.
alias = band_base + (word_offset << 5)
+ (bit_offset << 2)
/* SRAM band: 0x2000_0000 … 0x200F_FFFF (1 MiB)
SRAM alias: 0x2200_0000 … 0x23FF_FFFF (32 MiB) */
#define BIT_BAND_SRAM(byte, bit) \
(*(volatile uint32_t *) \
(0x22000000u + \
((uint32_t)(byte) - 0x20000000u) * 32 + \
(bit) * 4))
Every bit in the 1 MiB band has exactly one 4-byte alias — so 8 MiB of bit × 4 bytes = 32 MiB alias.
; set bit 5 of *p atomically
disable_irq: cpsid i
ldr r1, [r0]
orr r1, r1, #0x20
str r1, [r0]
enable_irq: cpsie i
4 instructions + IRQ mask window.
; set bit 5 of address p via alias
ldr r0, =0x22000000 + ((p-0x20000000)*32) + (5*4)
movs r1, #1
str r1, [r0]
1 store. No IRQ mask. The bus atomically performs the RMW on CPU's behalf.
| Type | Speculative fetch | Merging writes | Ordering | Typical use |
|---|---|---|---|---|
| Normal | Yes | Yes (within a beat) | Weak — CPU may reorder | Code, SRAM, cached external RAM |
| Device | No | No (each access separate) | Preserved within same type | Peripheral registers |
| Strongly-Ordered | No | No | Preserved globally; each access completes before next | PPB / NVIC / MPU / CoreSight |
A peripheral register might have side-effects on read. Two reads must produce two bus cycles; the CPU cannot coalesce or speculate.
Ordering between two software threads or between code and a DMA that shares Normal memory.
msg.payload = 0xDEADBEEF;
__DMB();
msg.ready = 1; /* consumer reads ready then payload */
Before code that assumes a side-effect has taken hold — e.g. starting a DMA after programming its registers; kicking the watchdog.
DMA1->CHENA |= 1;
__DSB(); /* ensure reg write hit the peripheral */
| Situation | Barrier | Why |
|---|---|---|
Writing SCB->VTOR then enabling IRQs | DSB + ISB | Later vector fetches use the new table |
| Enabling MPU / updating a region | DSB + ISB | Later fetches see new permissions |
| Changing CONTROL (switch to PSP / unprivileged) | ISB | Pipeline flush: instructions already in flight used the old mode |
| Self-modifying code (e.g. flash programming) | DSB + ISB + cache invalidate | Ensure cache coherency and pipeline reload |
| Configuring DMA registers then enabling | DSB | Make sure writes hit the DMA before enable |
| Clearing NVIC pending then enabling IRQ | DSB (optional) | Belt-and-braces; usually PPB writes are strongly-ordered anyway |
| Writing to flash configuration bytes | DSB | Ensure store drained before issuing program command |
| Clearing FPU CONTROL.FPCA | DSB + ISB | Avoid lazy-stacking ambiguity |
RBAR).RASR.SIZE = log₂(size) − 1).#include "core_cm4.h"
static void mpu_setup(void)
{
__DMB();
MPU->CTRL = 0; /* disable while editing */
/* Region 0: entire Code region, RO, executable, normal cacheable */
MPU->RNR = 0;
MPU->RBAR = 0x00000000UL;
MPU->RASR = (0 << 28) | /* XN=0, executable */
(6 << 24) | /* AP = RO priv+unpriv */
(0 << 19) | /* TEX=0 */
(1 << 18) | /* S=1 shareable */
(1 << 17) | /* C=1 cacheable */
(1 << 16) | /* B=1 bufferable */
(0 << 8) | /* SRD = 0 */
((32-1) << 1) | /* SIZE = 2^32 */
(1 << 0); /* ENABLE */
/* Region 1: SRAM RW priv, RO unpriv */
MPU->RNR = 1;
MPU->RBAR = 0x20000000UL;
MPU->RASR = (1 << 28) | /* XN=1 no exec */
(2 << 24) | /* AP priv RW, unpriv RO */
(1 << 18) | (1<<17) | (1<<16) |
((18-1) << 1) | /* SIZE = 256 KiB */
1;
MPU->CTRL = MPU_CTRL_PRIVDEFENA_Msk | /* default map for priv */
MPU_CTRL_ENABLE_Msk;
__DSB(); __ISB();
}
After enabling, any unprivileged write to 0x2000_0000+ fires a MemManage fault — kernel stays in control.
MPU_MAIR pair of 32-bit registers (à la A-profile MAIR)./* Set attr 0 = Normal WB inner+outer cacheable */
MPU->MAIR0 = (0xFFu << 0);
/* Region 0: flash 0x0800_0000 .. 0x082F_FFFF */
MPU->RNR = 0;
MPU->RBAR = 0x08000000 | (0u<<3) /* AttrIdx */
| (5u<<1) /* AP = priv RO, unpriv RO */
| 0; /* XN=0, executable */
MPU->RLAR = 0x082FFFE0 | 1; /* limit, ENABLE=1 */
MPU->CTRL = 1 | (1<<2); /* ENABLE + PRIVDEFENA */
Cleaner & more orthogonal than v7-M MPU. RTOSes target both via a single abstraction (e.g. FreeRTOS MPU port).
FreeRTOS & Zephyr have MPU-enforced "user-mode" tasks (xTaskCreateRestricted, Zephyr user threads) that use this split.
SCB_InvalidateICache, SCB_CleanDCache, SCB_InvalidateDCache operations in CMSIS.MSCR bits for easier cache maintenance from C.The bus IP choices (AXI, AHB-5) on MCUs don't implement a full snooping protocol. Full MOESI/MESI would cost area and power that an MCU class cannot afford.
DMA buffers must be placed in non-cacheable memory (via MPU attrs on the region, or in a bypass region) or the code must clean before sending / invalidate before receiving.
memcpy(tx_buf, src, n);
/* push dirty lines out before DMA reads */
SCB_CleanDCache_by_Addr((uint32_t*)tx_buf, n);
dma_start(tx_buf, n);
/* make sure we don't have stale lines */
SCB_InvalidateDCache_by_Addr((uint32_t*)rx_buf, n);
dma_start(rx_buf, n);
wait_for_dma_done();
/* after DMA writes, invalidate again so CPU reads from RAM */
SCB_InvalidateDCache_by_Addr((uint32_t*)rx_buf, n);
/* buffer now safe to use from CPU */
rx_buf must be 32-byte aligned and sized in multiples of 32 B on M7 — Invalidate operates by cache line, and partial lines would clobber adjacent variables. Use __attribute__((aligned(32))) + round-up the length.
| Operation | Cost | When |
|---|---|---|
SCB_EnableDCache() | one-time | Boot. Invalidates whole cache. |
SCB_InvalidateDCache() | full-cache flush | Rare — after a big external memory change, before first DMA. |
SCB_InvalidateDCache_by_Addr(p,n) | ~n/32 cycles | Before every DMA receive into a cacheable buffer. |
SCB_CleanDCache_by_Addr(p,n) | ~n/32 cycles | Before every DMA transmit from a cacheable buffer. |
SCB_CleanInvalidateDCache_by_Addr(p,n) | ~n/32 cycles | Bidirectional DMA (rare). |
SCB_InvalidateICache() | full I-flush | After self-modifying code or flash-patched code region. |
0x0000_0000 on STM32H7 when BOOT_ADD0 selects it).0x2000_0000 on STM32H7, 128 KiB).__REV, __REV16, __REVSH, __RBIT./* 'be32toh' on little-endian Cortex-M */
static inline uint32_t be32toh(uint32_t v)
{
uint32_t out;
__asm volatile ("rev %0, %1" : "=r"(out) : "r"(v));
return out;
}
/* Compiler already emits REV for __builtin_bswap32 */
uint32_t net = __builtin_bswap32(host);
MSPLIM, PSPLIM — 32-bit, 8-byte aligned lower-bound for each stack.SP < SPLIM → UsageFault with UFSR.STKOF.MSPLIM_S, PSPLIM_S, MSPLIM_NS, PSPLIM_NS./* FreeRTOS v8-M port snippet */
void vPortSetupTaskStack(StackType_t *top, StackType_t *bottom)
{
__set_PSPLIM((uint32_t)bottom);
}
/* Stack overflow → UsageFault at first push past limit */
void UsageFault_Handler(void)
{
if (SCB->CFSR & SCB_CFSR_STKOF_Msk) {
/* current task smashed its stack — reschedule */
configASSERT(!"stack overflow");
}
for (;;);
}
Dedicate a small MPU region or a separate SRAM bank with C=B=0. All DMA descriptors and buffers live there. No cache maintenance ever needed.
__attribute__((section(".itcm"))) for the fast path of the motor-control PWM ISR & the FIR kernel. Deterministic 1-cycle fetch; no I-cache jitter.
Large LUTs (sine tables, glyphs) in QSPI XIP region (0x9000_0000). Hot data in internal flash / AXI-SRAM. Linker script manages placement.
MPU region 0 covers SRAM top-slice RW priv, no unpriv. RTOS puts control blocks there; user tasks can never corrupt them.
Two buffers aligned to 32 B. DMA-receive into A while CPU processes B; on done, swap and invalidate D-cache for the newly-filled one.
Zero-init every word at boot to avoid ECC "uninitialised" traps on the first read — even on variables nominally uninitialised.
Arm — Armv7-M / Armv8-M Architecture Reference Manual — Chapter B3 (Memory model)
Arm — Cortex-M7 Technical Reference Manual (Section on cache maintenance, TCM)
Arm — Cortex-M Memory Protection Unit Programming Guide (Application note 321)
Joseph Yiu — Definitive Guide to Cortex-M3/M4, chapters on MPU, memory, bit-banding
STM32H7 Programming Manual (PM0253) — MPU, cache maintenance, TCM examples
ARM Community — "Cortex-M7 Cache maintenance for DMA" app note
Jean Labrosse — μC/OS-III Memory Management chapter — MPU + RTOS integration
Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.