ARM AMBA · PRESENTATION 01

History of AMBA

From the 1996 System Bus to 2024 Chiplets · Three Decades of On-Chip Interconnect
ASB · APB · AHB · AXI · ACE · CHI · CHI-C2C · UCIe
Navigate: → ←  |  Overview: Esc  |  Fullscreen: F
02

Why AMBA History Matters

  • By 1996 every SoC vendor had a proprietary bus (TI, Motorola, Philips, IBM CoreConnect). Arm's customers built chips containing other licensees' cores and had no neutral connective tissue.
  • AMBA was Arm's answer: an open specification anyone could use royalty-free, whether or not they licensed an Arm core.
  • The protocol layers you use on an Armv9 server today — APB for peripherals, AXI for NIC/GPU traffic, CHI for cache coherency — trace unbroken lineage back to that 1996 document.
  • Understanding why each successor exists (AHB → AXI → CHI) makes interview answers on backpressure, ordering, and coherency far more grounded.
"AMBA was never meant to be a commercial product. It was a piece of technical diplomacy — a way to get our customers to stop wasting engineering cycles reinventing the same bus." — Mike Muller, CTO of Arm 1990–2019, paraphrased from multiple interviews

The five successive AMBA generations map closely to five successive CPU generations: ARM7 needed ASB, ARM9/11 needed AHB, Cortex-A8 needed AXI, Cortex-A15 needed ACE, and Neoverse needed CHI.

03

Prehistory — The Proprietary-Bus Era

  • Early 1990s SoCs were mostly glue logic around a single CPU and a handful of peripherals. Every vendor had a proprietary bus:
    • IBM / Motorola CoreConnect (PLB, OPB, DCR)
    • Altera Avalon and Xilinx OPB-derived fabric
    • Silicore Wishbone (open, but not widely adopted)
  • Arm's licensees built SoCs integrating an ARM7 with a DSP, a UART, and SRAM — each from a different vendor, each on a different bus, glued together with bridges.
  • The integration cost was often larger than the cost of the Arm core itself.

The verification tax

Every new bridge was a new verification problem. VIP (Verification IP) did not yet exist commercially. A single SoC could contain three distinct bus dialects, each with its own BFM.

Arm's strategic move

If Arm defined a bus that anyone could implement for free, every Arm-ecosystem peripheral would speak the same language. The pitch was "AMBA makes our core more valuable without costing you anything."

04

AMBA 1 (1996) — ASB & APB

ASB — Advanced System Bus

  • 32-bit, tri-state data bus (driven by whichever master won arbitration).
  • Two-phase: ADDRESS then DATA on consecutive clocks. No pipelining.
  • Centralised arbiter with BREQ/BGRANT per master.
  • Matched the ARM7 memory interface — same cycle model.

APB — Advanced Peripheral Bus

  • Simple, low-power, two-phase: PSELPENABLE.
  • Single master (a bridge from ASB or AHB), no wait states in APB2.
  • For UART, timers, GPIO — things that did not need bandwidth.
AMBA 1 SoC — Typical Topology ARM7 DSP ASB ASB Arbiter Bridge UART Timer GPIO APB

ASB's tri-state data bus would become its downfall — high-speed synthesis tools struggled with bidirectional nets on-die.

05

AMBA 2 (1999) — AHB Arrives

  • AHB — Advanced High-performance Bus — replaced ASB for anything that mattered.
  • Single-edge, fully synchronous. No tri-state; data is a unidirectional mux (HRDATA from slave, HWDATA from master).
  • Pipelined address & data phases — the address for transfer N+1 is driven while data for transfer N completes.
  • Burst transfers: SINGLE, INCR, WRAP4/8/16.
  • Split / retry handshake lets slow slaves (off-chip memory) release the bus while they wait.
  • APB retained unchanged; all AHB ports now sat behind an AHB-to-APB bridge.

AHB signals — the core set

HCLK     HRESETn
HADDR    [31:0]
HTRANS   [1:0]   // IDLE/BUSY/NSEQ/SEQ
HWRITE
HSIZE    [2:0]
HBURST   [2:0]
HWDATA   [31:0]
HRDATA   [31:0]
HREADY           // slave: xfer done
HRESP    [1:0]   // OKAY/ERROR/RETRY/SPLIT
HSEL_x           // per-slave select
Why synchronous mattered: late-1990s STA (static timing analysis) tools were getting good enough to verify multi-million-gate synchronous designs. Tri-state on-die was a relic of an era when that wasn't possible.
06

AMBA 2 SoCs — ARM9, ARM11, Cortex-M3

  • Every ARM9-based mobile phone chip from 2000–2006 had AHB at its centre.
  • TI OMAP1/2, Freescale iMX1, Samsung S3C2410, Qualcomm MSM5xxx — all AHB.
  • Multi-layer AHB (2002) was the quick fix for multi-master bandwidth: a simple crossbar of AHB-Lite masters onto one or more AHB slaves.
  • For Cortex-M (2004+), Arm chose AHB-Lite for the CPU's instruction & data ports — a simplified AHB with one master and no retry/split. Still the current MCU bus.
EraChipTop-level bus
2001TI OMAP 1510 (ARM925 + C55x DSP)Dual AHB + shared SRAM
2002Samsung S3C2410 (ARM920T)AHB + APB
2003Freescale iMX21 (ARM926EJ-S)Multi-layer AHB
2007STM32F103 (Cortex-M3)AHB-Lite + APB1/APB2
2009Cortex-M0-based IoT MCUsAHB-Lite only
AHB refused to die. Even today, the inside of a Cortex-M55 MCU subsystem uses AHB5-Lite to connect TCM, SRAM, and the peripheral bridge — because the complexity of AXI would buy no benefit below ~300 MHz.
07

AMBA 3 (2003) — AXI Redefines the Bus

  • AHB's pipelined address/data still coupled the two phases — one transaction blocked the next. A cache miss could stall the entire bus for tens of cycles.
  • AXI decoupled the channels completely. Five independent channels, each with its own VALID/READY handshake:
    • AW — Write Address
    • W — Write Data
    • B — Write Response
    • AR — Read Address
    • R — Read Data
  • Transactions tagged with AxID could be issued and returned out of order, finally letting a DRAM controller reorder for page hits.

The five-channel revelation

Because each channel has its own backpressure, a long read burst does not stall a short write. A master with multiple outstanding IDs looks like a crossbar inside a single port.

Other AMBA 3 arrivals

  • ATB (AMBA Trace Bus) — unified the CoreSight trace fabric (ETM → TPIU).
  • APB3 — added PREADY for slave-side wait states and PSLVERR for error responses.
08

The AXI Handshake in One Diagram

Five AXI channels — independent VALID/READY handshakes MASTER (CPU / DMA / GPU) SLAVE (DDR / SRAM) AW — Write address channel W — Write data channel (burst) B — Write response channel AR — Read address channel R — Read data channel (burst, with RLAST) Every channel: VALID (source) · READY (sink) · PAYLOAD · ID · LAST (on data channels)
09

AMBA 4 (2010) — AXI4, AXI4-Lite, AXI4-Stream, ACE

AXI4 changes from AXI3

  • Burst length grew from 1–16 beats to 1–256 beats for INCR bursts (WRAP stays ≤16).
  • Write interleaving removed — in AXI3 a master could interleave write-data beats of different IDs; nobody ever used it and it added verification cost. AXI4 W-channel is now strictly ordered per-ID.
  • QoS (AxQOS, 4 bits), Region (AxREGION, 4 bits), and USER signals (implementation-defined).

Sub-profiles

  • AXI4-Lite — single-beat, no IDs. For CSR / register access.
  • AXI4-Stream — just T{DATA, VALID, READY, LAST, KEEP, USER}. No address at all. FPGA DSP pipelines standardised on it.

ACE — first AMBA coherency

  • Three new snoop channels: AC (snoop address), CR (snoop response), CD (snoop data).
  • Five cache states (MOESI variant).
  • New transaction types — ReadShared, ReadUnique, CleanUnique, MakeUnique, WriteBack, WriteClean.
  • DVM (Distributed Virtual Memory) channel for TLB invalidation.
Cortex-A15 (2011) was the first big-core to use ACE, plugged into the CCI-400 cache-coherent interconnect with two coherent clusters. big.LITTLE was born.
10

AMBA 5 (2013+) — CHI for Scale

  • ACE worked up to ~8 coherent CPUs. Beyond that, the snoop-broadcast overhead on a shared channel dominated.
  • For servers — Cortex-A57 and especially the new Neoverse line — Arm needed a clean-sheet protocol.
  • CHI — Coherent Hub Interface — is a packetised layered protocol, not a wire-level bus. It runs over any interconnect topology: ring, mesh, hybrid.
  • Four message channels: REQ, RSP, SNP, DAT.
  • Nodes are typed: RN-F / RN-I (requester), HN-F / HN-I (home), SN-F (slave-fabric), MN (misc).
  • Home-based directory + snoop filter, not broadcast.

Why "packetised"?

Each CHI message (REQ, SNP, etc.) is a small fixed-length packet with opcode, source ID, target ID, transaction ID. The underlying transport (Arm's CMN-600 mesh or a custom NoC) can retime, buffer, and route packets however it likes — as long as the protocol ordering rules are preserved.

This is the essential shift from earlier AMBA: AXI/ACE are signal-level protocols locked to a particular wire shape. CHI is transport-agnostic.
11

AMBA Version Timeline

1996
AMBA 1 — ASB (32-bit tri-state) + APB (peripheral bus). Released as an open specification by Arm.
1999
AMBA 2 — AHB (pipelined, synchronous) replaces ASB; APB unchanged. Becomes the 32-bit SoC default.
2002
Multi-layer AHB published — crossbar of AHB-Lite masters, pragmatic bandwidth fix.
2003
AMBA 3AXI (5 independent channels, out-of-order IDs) + ATB (trace) + APB3 (PREADY/PSLVERR).
2004
Cortex-M3 adopts AHB-Lite. Millions of downstream MCUs will inherit it.
2010
AMBA 4 — AXI4 (bursts up to 256, QoS, USER), AXI4-Lite, AXI4-Stream, and ACE (first coherent AMBA).
2011
Cortex-A15 + CCI-400 — first big.LITTLE SoCs with coherent clusters over ACE.
2013
AMBA 5 CHI (Issue A) — packet-based, scales past 16 coherent nodes.
2017
AMBA 5 refresh — AXI5, AHB5, APB5, ACE5, CHI-B/C — add atomics, user request attributes, cache stashing, MTE/TrustZone hooks.
2019
CHI-D adds features for CMN-650 — MPAM hooks, per-transaction QoS.
2021
CHI-E arrives with CMN-700. RME (Realm Management Extension) hooks for CCA. Neoverse V1/V2 era.
2023
CHI-C2C announced — chiplet-to-chiplet CHI transport over UCIe. The AMBA-for-chiplets era begins.
12

Generations — Interactive

Click a generation to see what it added and what it enabled.

AMBA 1
AMBA 2
AMBA 3
AMBA 4
AMBA 5
CHI-C2C
AMBA 4 (2010)
Protocols: AXI4 · AXI4-Lite · AXI4-Stream · ACE · ACE-Lite · APB4 · ATB
  • AXI4 extends AXI3 burst length to 256, removes write interleaving, adds QoS/REGION/USER.
  • AXI4-Lite: register-access subset; one beat, no IDs.
  • AXI4-Stream: pure data pipe for DSP/FPGA.
  • ACE brings five-state coherency and DVM.
  • Used in Cortex-A7/A15, Mali-T6xx GPUs, Arm CCI-400/500.
13

Why Five Protocols Coexist Today

  • A modern Arm server SoC contains all five protocols simultaneously — each chosen for fit, not for age.
  • Nothing deprecated AHB: it's genuinely the right answer for low-power MCU interconnect.
  • Nothing deprecated AXI: it's the right answer for a GPU or NIC talking to memory over a crossbar.
  • CHI exists above AXI, not instead of it.
ProtocolBest forNotable users
APBPeripherals / CSR blocksUART, timers, GPIO everywhere
AHBMCU CPU interface, small crossbarsCortex-M0/M3/M4/M7/M33/M55
AXIHigh-bandwidth, non-coherent mastersMali GPU, NIC, NPU, DMA
ACE2–8 coherent CPU clustersCCI-400/500/550, Cortex-A big.LITTLE
CHI16+ coherent nodes, mesh NoCCMN-600/650/700, Neoverse N/V
14

AMBA as Arm's Silent Moat

  • The instruction set (A32/T32/A64) is licensed and revisioned publicly. Competitors have imitated it — with legal consequences (Apple, Nvidia, Qualcomm/Nuvia cases).
  • AMBA was given away. Arm charges nothing for the protocol specification — but it now underpins every mainstream SoC.
  • Every verification IP vendor (Cadence, Synopsys, Siemens EDA) ships an AMBA VIP. Every EDA tool recognises the protocol. Every textbook teaches it.
  • This makes an Arm-licensed core the path of least resistance: its interface already matches the rest of your chip.
"AMBA is the mostly-invisible gravity well of Arm's ecosystem. Nobody pays for it. Nobody talks about it at launches. But you cannot exit it without paying an enormous integration tax." — Analyst blog, Linley Group (2019)

RISC-V's AMBA pragmatism

Even RISC-V cores (SiFive, Ventana, NVIDIA NVCore) expose AXI/CHI externally — because every IP around them speaks AMBA. The open ISA has not dislodged Arm's interconnect stack.

15

How the Specs Are Published

  • Each AMBA protocol has a dedicated IHI ("Industry Holding for Interconnect") reference document:
    • IHI 0011 — AMBA Specification (ASB/APB, 1996, now historical).
    • IHI 0033 — AHB / AHB-Lite / AHB5.
    • IHI 0024 — APB / APB3/APB4/APB5.
    • IHI 0022 — AXI, AXI4, AXI5.
    • IHI 0039 — ACE, ACE5, ACE-Lite.
    • IHI 0050 — CHI (issues A–F).
  • All downloadable free from developer.arm.com after a click-through EULA.

What EULA actually allows

Royalty-free implementation rights. You can build an AMBA-compatible SoC, VIP, or core with no payment to Arm.

What you cannot do is claim certification without Arm's testing programmes — but nobody enforces that commercially; the market rewards de-facto compliance.

Arm also maintains a formal ABVIP (AMBA Protocol Verification IP) programme that supplies formal properties — the same ones big verification shops use on every customer tape-out.
16

Modern Arm Server Topology

Neoverse N2/V2 SoC — AMBA roles at a glance CPU RN-F CPU RN-F CPU RN-F CPU RN-F CPU RN-F CPU RN-F HN-F (Home) HN-F (Home) CMN-700 Mesh — CHI-E NIC (200 GbE) NPU (GEMM) DMA AXI5/ACE-Lite DDR5 Controller (SN-F) HBM3 Controller (SN-F) SMMU + PCIe/CXL Bridge System Ctrl (APB5) Every wire on this diagram is an AMBA protocol: CHI inside the mesh, AXI/ACE-Lite for accelerators, APB for the system-control fabric.
17

Key Protocol Inflection Points

Synchronous — 1999

AHB made the whole bus rising-edge only. Enabled clock-gating and static timing at any clock speed.

Decoupled — 2003

AXI split control and data into five independent channels. Underpins every modern DRAM controller.

Coherent — 2010

ACE brought directory-less (snoopy) coherency to AMBA. Enabled big.LITTLE and multi-cluster mobile SoCs.

Packetised — 2013

CHI decoupled the protocol from the wire. Any topology — ring, mesh, chiplet — can carry it.

Secure & Partitioned — 2021

CHI-E + MPAM + RME/CCA hooks give QoS and Realm isolation end-to-end through the interconnect.

Chiplet-coherent — 2023

CHI-C2C carries coherency across die-to-die UCIe links — the interconnect follows the silicon substrate.

18

Who Competes with AMBA?

AlternativeOriginFate
IBM CoreConnect (PLB/OPB)1999IBM PowerPC-only; dead outside that
Altera Avalon2002FPGA-only
OpenCores Wishbone2002Educational; rarely in commercial SoCs
Sonics SMART Interconnect2000sAcquired by Facebook (2019)
Arteris FlexNoC2006Exists as NoC above AMBA
Intel IDI / UPIInternalIntel CPUs only; no external licensees
RISC-V TileLink2017SiFive-led; losing to AMBA in downstream IP

Why none have displaced AMBA

  • Network effect: every IP vendor ships AMBA, so picking anything else means rewriting wrappers.
  • VIP economics: Cadence's AMBA VIP is mature; new protocols start from zero.
  • Arm's own IP (CPU, GPU, NPU, interconnect) uses AMBA natively — picking an Arm core means picking AMBA.
TileLink is the interesting one. It is RISC-V's answer — but even SiFive's commercial cores expose AXI externally because customers demand AMBA at SoC boundaries.
19

AMBA in Verification & VIP

  • Commercial AMBA VIP is a multi-hundred-million-dollar market — Cadence (Denali/VIP Catalog), Synopsys (DesignWare VIP), Siemens EDA (Mentor QVIP).
  • Every VIP provides:
    • UVM agents (master/slave/monitor)
    • Protocol compliance checkers (on every beat)
    • Coverage models (bursts, IDs, QoS, exclusive)
    • Formal property libraries
  • Arm ships PVIP (Protocol Verification IP) for AXI/ACE/CHI — formal assertions tied directly to the spec wording.

Why formal works well

AMBA protocols are handshake-based (VALID/READY), with small state. That makes them a near-perfect fit for assertion-based formal verification — you can prove absence of deadlock, starvation, and ordering violations cleanly.

Jasper/Questa formal projects on AXI/CHI are one of the dominant EDA workloads today. A modern NoC team will run a 24-hour formal proof suite every night on the fabric.
20

Cultural Moments

"By 2000 we realised AMBA was worth more than any of our cores. Every phone chip had one AHB at its heart, and we had given it away." — Robin Saxby, first Arm CEO, 2010 retrospective
"AXI is the closest thing the semiconductor industry has to a lingua franca. If you want two blocks to talk, you default to AXI and write an adapter if you absolutely have to." — Chief Architect, unnamed US fabless SoC company, 2018

AMBA anecdotes

  • The original AMBA 1 document was 91 pages. CHI-E in 2021 is >2,000.
  • Apple's A-series SoCs used AXI exclusively from the A4 onward — the same protocol Apple's Newton team helped spawn at the Arm founding.
  • The ARM7TDMI datasheet (1994) did not say "AHB" or "AXI" — it exposed a proprietary Arm memory interface. AMBA is the external contract that came later.
  • Every Arm IP release today includes an AMBA compliance statement in its front matter — "this block is an AXI5 manager with ID width 6, burst length up to 256, ..."
21

Interview-Ready Takeaways

  • "Why did AHB replace ASB?" → Tri-state on-die doesn't synthesise well; AHB's unidirectional muxes + synchronous clocking enabled higher speeds and better STA.
  • "Why does AXI have five channels?" → To decouple address/data/response for independent backpressure. A long read burst must not stall a write.
  • "Why did AXI4 remove write interleaving?" → Nobody used it; it doubled the verification state space for zero benefit.
  • "Why do we need CHI if ACE exists?" → ACE scales to ~8 coherent CPUs; beyond that the broadcast snoop fan-out is impractical. CHI uses directory/snoop-filter at Home nodes.
  • "Where does AHB-Lite still make sense?" → Inside an MCU, where the CPU is the only master. Full AHB's arbitration machinery is overhead no MCU needs.
  • "What is AXI4-Stream actually for?" → Unit-rate data pipes where there is no addressing — DSP, video, FPGA pipelines. Effectively an on-chip FIFO interface standard.
  • "Why is CHI-C2C coming?" → Chiplets need coherent memory that looks like one SoC. CHI is the only mature coherent protocol with an open spec; UCIe is the physical-layer substrate.
  • "What do APB5 and AHB5 add?" → Security attributes (NSE/RME), wake-up, user signals, atomic hooks.
22

References

Arm Ltd.AMBA Specification (Rev 2.0, 1999) and AMBA AXI and ACE Protocol Specification (IHI 0022, IHI 0039) — downloadable free from developer.arm.com
Arm Ltd.AMBA 5 CHI Architecture Specification (IHI 0050), Issues A through F
Arm Ltd.AMBA AHB Protocol Specification (IHI 0033), including AHB-Lite and AHB5
Flynn, D. et al.AMBA — Enabling reusable on-chip designs (IEEE Micro, 1997) — the foundational AMBA 1 paper
Furber, S.ARM System-on-Chip Architecture, 2nd ed. (Addison-Wesley, 2000) — Chapter on AMBA history & AHB protocol
Stallings, W.Computer Organization and Architecture, 11th ed. — chapter on on-chip interconnects
Pavan, P. & Sarma, D.System-on-Chip Test Architectures (Morgan Kaufmann, 2007) — AMBA for test access port design
Kessler, R. & Heisler, J. — various IEEE HotChips / HPCA papers on Arm Neoverse and CMN-600/700
Linley Group / TechInsights / SemiAnalysis — ongoing industry analysis of AMBA adoption
Wikipedia — "Advanced Microcontroller Bus Architecture" and "Coherent Hub Interface" — well-sourced cross-references

Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.