ARM AMBA · PRESENTATION 02

AHB & APB

The Original AMBA Protocols · Still Everywhere Inside Your MCU
AHB · AHB-Lite · AHB5 · APB2/3/4/5 · Multi-Layer AHB · HREADY · PSEL/PENABLE
Navigate: → ←  |  Overview: Esc  |  Fullscreen: F
02

Why AHB & APB Still Matter

  • Every Cortex-M MCU you touch — STM32, nRF52, LPC, SAM, RP2040 — uses AHB-Lite for the CPU's instruction & data buses and APB for its register-mapped peripherals.
  • Cortex-M55 and M85, released in 2020 and 2022, are AHB5-Lite at the core. AHB has outlived AXI by now: it's the low-power bus of record.
  • An AHB transaction is one address phase and one data phase. It is the easiest high-speed bus to reason about — especially useful in interviews.
  • APB is so simple that a whole UART can live in fewer than 200 lines of Verilog.

Rule of thumb

Use AHB when you need burst bandwidth but only one or two masters. Use APB for every register-mapped peripheral. Use AXI when you cross into a high-performance, high-latency, out-of-order world.

03

AHB — Signals

// Address phase (driven by master)
HCLK        // single clock
HRESETn     // active-low reset
HADDR [31:0]
HTRANS[1:0] // 00 IDLE, 01 BUSY,
            // 10 NONSEQ, 11 SEQ
HWRITE
HSIZE [2:0] // 001 halfword, 010 word, 011 dword…
HBURST[2:0] // SINGLE, INCR, WRAP4/8/16, INCR4/8/16
HPROT [3:0] // cacheable, bufferable, privileged, D/I
HMASTLOCK

// Data phase (driven by master/slave)
HWDATA[31:0]  // master → slave
HRDATA[31:0]  // slave  → master
HREADY        // slave: xfer complete
HRESP [1:0]   // OKAY, ERROR, RETRY, SPLIT
HSEL_x        // per-slave, decoded from HADDR

Two phases, pipelined

  • Address phase of transfer N+1 overlaps the data phase of transfer N.
  • New address every cycle unless HREADY is low.
  • HREADY is the one and only backpressure signal — it gates the whole bus.
HTRANS encodes what the current address means: is it the start of a new transfer (NONSEQ), a continuation of a burst (SEQ), a break in an otherwise ongoing burst (BUSY), or no transfer at all (IDLE)?
04

AHB — The Pipelined Handshake

Two consecutive reads — address phase of N+1 overlaps data phase of N HCLK HADDR A0 A1 A2 HTRANS NONSEQ SEQ SEQ HREADY (high throughout) HRDATA D0 D1 D2 Rule: HADDR latched at rising edge when HREADY=1 HRDATA valid one cycle later when HREADY=1 Slave can stretch phases by holding HREADY low
05

AHB — Wait States & Backpressure

  • Slave drives HREADY low to extend the current data phase — and by the pipelining rule, that also extends the address phase of the next transfer.
  • The selected slave drives HREADYOUT; a global mux ORs all slaves and the bus's overall HREADY comes back to every master.
  • Idle slaves must drive HREADYOUT = 1; only the addressed slave may pull it low.
  • In AHB-Lite HREADY = HREADYOUT of the selected slave (no arbitration).

Zero-wait-state slave

If a slave can always complete a data phase in one cycle, it simply ties HREADYOUT high — and the pipelined bus runs at peak.

Slow slave

Flash or external RAM drops HREADYOUT until its own timing is met. The master sees the whole bus stall — an intentional design point for MCU-class systems where worst-case stalls are rare.

06

AHB — Burst Types

HBURSTNameLengthAddress step
000SINGLE1
001INCRundefinedby HSIZE
010WRAP44wraps at 4·HSIZE
011INCR44by HSIZE
100WRAP88wraps at 8·HSIZE
101INCR88by HSIZE
110WRAP1616wraps at 16·HSIZE
111INCR1616by HSIZE

Why WRAP bursts?

Cache line fills. A cache-line miss wants the critical word first, then the remaining words in wrap-around order — so the CPU gets data it can use on the first beat.

WRAP4 at address 0x08 with HSIZE=word gives 0x08 → 0x0C → 0x00 → 0x04.

Undefined-length INCR is useful when the master doesn't know in advance how long the burst will be (e.g. DMA serving a FIFO) — it simply starts with NONSEQ, continues with SEQ, and ends by issuing IDLE or a new NONSEQ on a different address.
07

AHB — Response Types

HRESP[1:0]NameMeaning
00OKAYNormal success
01ERRORTransfer failed (bus error)
10RETRYMaster must retry later (AHB only)
11SPLITSlave will release bus & raise HSPLITx(n) when ready

AHB-Lite has only OKAY and ERROR — the RETRY / SPLIT machinery was removed as part of the simplification.

SPLIT — the clever bit of AHB

A slow slave (e.g. a memory controller waiting on an off-chip burst) could respond SPLIT. The arbiter would remove that master from arbitration until the slave asserted HSPLITx[n] for that master number — like a credit-return.

This let other masters continue using the bus during the long latency. Conceptually close to AXI's out-of-order IDs — the same idea, older implementation.

Why SPLIT died: verifying it was painful, arbiters needed per-master SPLIT queues, and AXI subsumed the use case. AHB-Lite kept everything else but dropped SPLIT.
08

AHB — Arbitration & Multi-Master

  • In full AHB, up to 16 masters share the bus via a central arbiter.
  • Per-master signals: HBUSREQx, HLOCKx (atomic lock), HGRANTx.
  • Arbiter drives HMASTER[3:0] so slaves can identify who's talking (important for SPLIT).
  • Policy is implementation-defined — typically round-robin, sometimes fixed priority for real-time masters.
  • Masters win the grant one transfer at a time (or one burst at a time if HMASTLOCK asserted).
Full AHB with 3 masters CPU DMA USB Arbiter + Address Mux SRAM ROM APB BR Round-robin arbiter grants per transfer
09

AHB-Lite (2006) — The Subset that Won

  • Introduced alongside Cortex-M3 in 2004 and formalised in the 2006 AHB-Lite addendum.
  • Exactly one master. No arbitration signals, no HMASTER, no HBUSREQ.
  • No RETRY / SPLIT — only OKAY and ERROR.
  • Perfectly matches an MCU CPU: the Cortex-M has one instruction port and one data port, each an AHB-Lite master talking to a bus matrix.

For multi-master MCU designs, Arm published multi-layer AHB (2002) — a simple interconnect of AHB-Lite masters crossbarred onto AHB slaves. No shared single-master view; each master has its own port into the fabric.

Why AHB-Lite succeeded

  • Removing arbitration cut ~40% of AHB's verification surface.
  • MCUs are single-threaded — the CPU is usually the only master for minutes at a time.
  • When multi-master is really needed, a multi-layer bus matrix gives deterministic bandwidth instead of arbitration jitter.
Cortex-M ports: I-Code (fetch via AHB-Lite to flash), D-Code (constant loads to flash), System (everything else). The "bus matrix" inside the MCU decodes HADDR ranges and routes to SRAM / ROM / APB bridge / DMA.
10

AHB5 (2015) — Security, Atomics, & More

Key additions over AHB-Lite

  • HNONSEC — 1 bit: 0 = Secure, 1 = Non-Secure. Aligns AHB with TrustZone for Armv8-M and Armv8-A.
  • HEXCL / HEXOKAY — exclusive access support, equivalent of AXI's LDREX/STREX pair.
  • HMASTER extended to help the PPC / bus filter identify the originating master for security checks.
  • HAUSER / HWUSER / HRUSER — user sideband signals for implementation-specific attributes.
  • Optional multi-copy atomicity (MCA) compliance — a system-level guarantee that all observers see writes in the same order.

Why AHB needed security

When TrustZone for Armv8-M arrived in 2016 (Cortex-M23, M33), the core internally produced Secure/Non-Secure transactions — and that security attribute had to propagate through the AHB fabric to the MPU / bus matrix / APB bridge.

An AHB5 SoC uses HNONSEC to gate which peripherals a Non-Secure master can reach — the same "attribution-follows-the-transaction" idea you get in AXI4's AxPROT[1].
11

APB — The Peripheral Bus

PCLK                // single clock
PRESETn             // active-low reset
PADDR [31:0]
PSEL_x              // per-slave select
PENABLE             // two-phase indicator
PWRITE
PWDATA[31:0]
PRDATA[31:0]
PREADY              // (APB3+) slave ack
PSLVERR             // (APB3+) error
PPROT [2:0]         // (APB4+) secure/priv/data
PSTRB [3:0]         // (APB4+) byte strobes
PNSE / PWAKEUP      // (APB5) security / wake

Two-phase handshake

  • SETUP: PSEL=1, PENABLE=0, PADDR + PWRITE + PWDATA valid.
  • ACCESS: PSEL=1, PENABLE=1. In APB3+ slave must drive PREADY=1 to complete, or 0 to stretch.
  • Transaction ends; go back to IDLE (PSEL=0).
Minimum transaction = 2 cycles. No bursts. No concurrency. It's not supposed to be fast — it's supposed to be tiny.
12

APB — Waveform

APB3 write followed by APB3 read — always two phases (SETUP, ACCESS) PCLK PSEL_x PENABLE PWRITE PADDR addr W addr R PREADY SETUP ACCESS SETUP ACCESS Phase rules: SETUP: PSEL=1, PENABLE=0 — 1 cycle ACCESS: PSEL=1, PENABLE=1 — 1 or more cycles Slave holds PREADY low to stretch ACCESS. After PREADY=1 high on ACCESS, return to IDLE.
13

APB Evolution — APB2/3/4/5

VersionYearAdded signals / features
APB21996Baseline. PSEL/PENABLE/PWRITE/PADDR/P{W,R}DATA. Slaves had to complete in 1 cycle.
APB32003PREADY (wait states), PSLVERR (error response).
APB42010PPROT[2:0] (data/inst, privileged, secure). PSTRB[3:0] (byte strobes).
APB52015PWAKEUP (low-power wake), PNSE (RME Non-secure-extended), user signals, PPROT[2]=NS harmonised.

APB's killer feature — lack of features

A UART with 8 registers at 100 kHz doesn't need burst transfers, out-of-order IDs, or cache coherency. It needs a clocked 32-bit interface with byte-strobed writes and an ack.

APB4 gives you exactly that in ~10 signals. It's the reason every SoC has at most one APB master (an AHB-to-APB bridge) but can have hundreds of APB slaves.

14

AHB-to-APB Bridge

  • The canonical boundary between the fast AHB system bus and the slow APB peripheral bus is an AHB-to-APB bridge.
  • Responsibilities of the bridge:
    • Accept AHB transactions; hold HREADY low until the APB side completes.
    • Drive PSEL / PENABLE according to the two-phase APB protocol.
    • Translate HSIZE / HWRITE / HPROT into PSTRB / PWRITE / PPROT.
    • Map AHB address ranges to PSELx lines (one PSEL per peripheral).
    • Return PSLVERR → HRESP=ERROR cleanly.

Clock domain crossing

Many bridges also perform a clock-domain crossing — AHB at CPU speed (e.g. 200 MHz), APB at a slow peripheral clock (e.g. 25 MHz).

The bridge buffers one AHB request at a time, synchronises it across to PCLK, runs the APB transaction, and synchronises the result back. No address pipelining across the bridge.

Every STM32 has two or three APB buses — APB1 (slow, many low-speed peripherals) and APB2 (faster, for ADC and GPIO) — each fed by its own AHB-to-APB bridge.
15

Multi-Layer AHB

  • For MCU-class systems that need 2–4 masters (CPU + DMA + USB, say) without the complexity of AXI.
  • Each master has its own AHB-Lite port into the fabric — no shared address bus.
  • The matrix contains one per-slave arbiter; decoding per-master routes each request to its target slave.
  • If two masters target different slaves, they run concurrently with full bandwidth. If they target the same slave, the per-slave arbiter (round-robin or fixed) picks a winner.
Multi-layer AHB (3×3) CPU-M DMA-M USB-M Interconnect 3 masters × 3 slaves, per-slave arbiter SRAM Flash APB BR Parallelism where paths don't collide
16

Minimal AHB-Lite Slave in 30 lines

module ahb_lite_sram #(parameter ADDR_W = 12) (
  input  logic            HCLK,
  input  logic            HRESETn,
  input  logic            HSEL,
  input  logic [31:0]     HADDR,
  input  logic [1:0]      HTRANS,
  input  logic            HWRITE,
  input  logic [2:0]      HSIZE,
  input  logic [31:0]     HWDATA,
  output logic [31:0]     HRDATA,
  output logic            HREADYOUT,
  output logic [1:0]      HRESP
);
  logic [31:0] mem [0:(1<<ADDR_W)-1];
  logic [ADDR_W-1:0] addr_q;
  logic write_q, sel_q;

  // Sample address phase
  always_ff @(posedge HCLK or negedge HRESETn)
    if (!HRESETn) {sel_q, write_q, addr_q} <= '0;
    else if (HREADYOUT) begin
      sel_q   <= HSEL && HTRANS[1];          // NONSEQ or SEQ
      write_q <= HWRITE;
      addr_q  <= HADDR[ADDR_W+1:2];           // word addr
    end

  // Data phase
  always_ff @(posedge HCLK)
    if (sel_q && write_q) mem[addr_q] <= HWDATA;

  assign HRDATA     = sel_q && !write_q ? mem[addr_q] : '0;
  assign HREADYOUT  = 1'b1;   // zero-wait-state slave
  assign HRESP      = 2'b00;  // OKAY
endmodule

Single-port SRAM, zero wait states, OKAY only. Real peripherals add HSIZE handling, PROT checks, and byte strobes.

17

Common AHB / APB Pitfalls

Pitfall: forgetting HTRANS gating

A slave that latches HADDR unconditionally will misfire on IDLE / BUSY cycles. Always gate on HSEL && HTRANS[1] && HREADY.

Pitfall: HREADYOUT during idle

Idle slaves must drive HREADYOUT = 1; if a slave tristates or drives 0, the whole bus hangs.

Pitfall: ERROR response timing

AHB mandates a two-cycle ERROR: in the first data cycle drive HRESP=ERROR with HREADY=0, and in the second drive HRESP=ERROR with HREADY=1. Miss this and the master's fault-handler behaves incorrectly.

Pitfall: APB without PREADY timeout

If an APB slave misbehaves and never asserts PREADY, the bridge — and thus the whole AHB above it — stalls indefinitely. Most bridges implement a watchdog that forces PSLVERR on timeout.

Pitfall: Mixed HSIZE on the same burst

All beats in an AHB burst must share HSIZE. A half-word + word burst is protocol-illegal — split into two bursts instead.

Pitfall: WRAP across 1 KB boundary

AHB forbids a burst from crossing a 1 KB boundary. Masters must split longer transfers. Slaves should assert ERROR if they see one anyway.

18

AHB vs AXI — When Each Wins

DimensionAHB-LiteAXI4
Channels1 (address+data shared phases)5 independent
Out-of-orderNoYes, via AxID
Outstanding transactions1Many (implementation limit)
Burst length≤16≤256
Typical areaVery smallLarger
PowerEasy to clock-gateNeeds more care
Good forMCUs, register mgmtHigh-BW masters

The pragmatic answer

Use AHB-Lite where the source traffic is bursty but low-latency (MCU CPU misses to TCM). Use AXI where the source is high-throughput and latency-tolerant (DMA, GPU, NIC, NPU).

Modern MCU reality: Cortex-M55 has an AXI5 master port and an AHB5-Lite port. The AXI5 port fetches from cache-coherent memory; the AHB5-Lite port talks to TCM and the PPB. One core, two AMBA protocols.
19

Minimal APB Slave & Loopback Patterns

Minimal APB3 slave — ~15 lines

module apb_loopback_reg (
  input  logic        PCLK, PRESETn,
  input  logic        PSEL, PENABLE, PWRITE,
  input  logic [31:0] PADDR, PWDATA,
  output logic [31:0] PRDATA,
  output logic        PREADY, PSLVERR
);
  logic [31:0] r_q;
  always_ff @(posedge PCLK or negedge PRESETn)
    if (!PRESETn)                          r_q <= 32'h0;
    else if (PSEL && PENABLE && PWRITE)  r_q <= PWDATA;

  assign PRDATA  = r_q;
  assign PREADY  = 1'b1;   // always ready (zero wait states)
  assign PSLVERR = 1'b0;
endmodule

This really is the minimum: one flop, three combinational outputs. PREADY tied high makes every access a two-cycle SETUP/ACCESS round-trip.

Even smaller — APB read-only status register

module apb_status_ro #(parameter W = 32) (
  input  logic        PCLK, PRESETn,
  input  logic        PSEL, PENABLE,
  input  logic [W-1:0] status_i,   // external live signal
  output logic [31:0] PRDATA,
  output logic        PREADY, PSLVERR
);
  assign PRDATA  = {{(32-W){1'b0}}, status_i}; // pure wires
  assign PREADY  = 1'b1;
  assign PSLVERR = 1'b0;
endmodule

Loopback use cases

  • APB ping slave — the one-register version is the go-to "Is the bridge alive?" DV probe.
  • AHB-Lite loopback — extend the 30-line slave from earlier by always returning HRDATA=HWDATA on the previous transfer. Useful for back-to-back read/write stress.
  • Single-register AHB pipeline stage — one flop on HADDR/HTRANS/HWDATA + one flop on HRDATA is the minimum retiming cell for AHB fabric timing closure.

For HW bring-up silicon: a mask-ROM'd APB slave often returns a fixed "MAGIC" word so the first boot-check is just if (*APB_BASE == 0xDEADBEEF) go;

20

Interview-Ready Takeaways

  • "Why is AHB pipelined?" → To double utilisation of the data bus. Address phase N+1 overlaps data phase N so the bus never idles between transfers.
  • "What does HREADY actually do?" → It's the single backpressure signal. Gated by the selected slave, it stretches both address and data phases simultaneously.
  • "When would you use WRAP4 vs INCR4?" → WRAP4 for cache-line fills (critical-word-first); INCR4 for linear streaming.
  • "Why did AHB-Lite drop SPLIT/RETRY?" → Single-master use case made them unnecessary; the verification cost wasn't worth it.
  • "Why two phases on APB?" → Deterministic slave design — one cycle to set up, one cycle to access. Lets you write tiny slaves without FSMs.
  • "When do you choose APB over AHB?" → When the peripheral's traffic rate is far below the CPU clock (UART, GPIO, timer) — use APB and save area/power.
  • "What does HNONSEC buy you in AHB5?" → Security-attribute propagation — the same TrustZone contract as AXI4's AxPROT[1], so a Non-Secure transaction can be blocked at the bus matrix.
  • "Why does the STM32 have APB1 and APB2?" → Two clock domains — APB1 slow (HCLK/4), APB2 faster (HCLK/2). Each has its own bridge.
21

References

Arm Ltd.AMBA AHB Protocol Specification (IHI 0033) — includes AHB, AHB-Lite, and AHB5
Arm Ltd.AMBA APB Protocol Specification (IHI 0024) — APB, APB3, APB4, APB5
Arm Ltd.AMBA Multi-Layer AHB Overview (ARM IHI 0048)
Arm Ltd.Cortex-M3 Technical Reference Manual — how the CPU presents its AHB-Lite ports
Yiu, J.The Definitive Guide to ARM Cortex-M3/M4 Processors — chapters on bus interface and memory system
Flynn, D. & Luke, S.AMBA AHB — enabling bus-based SoC design (IEEE Micro, 1997)
Ashenden, P.The Designer's Guide to VHDL & Digital Design Using VHDL — example AHB-Lite / APB slave implementations
STMicroelectronicsSTM32F4 Reference Manual (RM0090) — canonical example of AHB / APB1 / APB2 hierarchy
ARM DUI 0552ACortex-M MCU system-bus design guide — multi-layer AHB examples

Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.