VALID and the payload; sink drives READY.ACLK edge when both are asserted.// Source side — safe shape
always_ff @(posedge ACLK) begin
if (!ARESETn) begin
VALID <= 0;
data <= 0;
end else begin
if (VALID && READY) begin
// handshake — advance
VALID <= next_valid;
data <= next_data;
end else if (!VALID && have_data) begin
// present new data
VALID <= 1;
data <= new_data;
end
end
end
Sink-side back-pressure is just: assert READY whenever it has buffer room.
Click a burst type to see how the address evolves.
| AXI version | INCR max | WRAP options |
|---|---|---|
| AXI3 | 16 beats | 2/4/8/16 |
| AXI4 | 256 beats | 2/4/8/16 |
| AXI4-Lite | 1 beat only | — |
| AXI4-Stream | no length concept | — |
4 KB rule: a single INCR burst must not cross a 4 KB address boundary — because that's the smallest MMU page size, and bursts that cross pages would risk partial permission violations.
| AxSIZE | Beat width |
|---|---|
| 0 | 8 bits (byte) |
| 1 | 16 bits (half) |
| 2 | 32 bits (word) |
| 3 | 64 bits |
| 4 | 128 bits |
| 5 | 256 bits |
| 6 | 512 bits |
| 7 | 1024 bits |
AxSIZE must be ≤ the data-bus width. Narrower bursts on a wide bus waste lanes (write strobes, or read-mask on the consumer side).
AxID (AWID or ARID). Responses come back tagged with the same ID (BID or RID).Master issues:
ARID=3 AR0
ARID=5 AR1
ARID=3 AR2
ARID=5 AR3
Slave can return:
RID=5 R1 (ahead — page hit)
RID=5 R3 (after R1, same ID)
RID=3 R0 (ID=3 first outstanding)
RID=3 R2 (after R0, same ID)
AWINTERLEAVE depth ≥ 2.Why? Because nobody ever used it. Masters either couldn't buffer the second write's data, or preferred to serialise writes to avoid corner cases. Verification shops hated it — the state space exploded for zero benefit.
Write-side ordering on AXI4: AW order defines W order. Read-side ordering per-ID only. Most masters can still issue AWs with different IDs and rely on the slave to reorder responses (B channel) — that was the more important part anyway.
WSTRB[N-1:0] has one bit per byte lane of WDATA. A bit of 1 means "write this byte"; 0 means "ignore it".// Write 0xAA to byte 3 of a word
// on a 64-bit (8-byte) bus
AWADDR = 0x1000
AWSIZE = 3 (64-bit)
WDATA = 0x0000_0000_AA00_0000
WSTRB = 8'b0000_1000
// Byte 3 = 0xAA; others untouched
| RRESP/BRESP | Meaning |
|---|---|
| 2'b00 OKAY | Normal success (non-exclusive) |
| 2'b01 EXOKAY | Exclusive access succeeded (store visible atomically) |
| 2'b10 SLVERR | Slave error (bad address, protected region, internal error) |
| 2'b11 DECERR | Decode error (no slave selected for this address) |
ARLOCK/AWLOCK = 1 to mark an exclusive pair.AxCACHE[3:0] tells the bus what memory type the transaction targets. Crucial for cache maintenance and correctness.| AxCACHE | Meaning |
|---|---|
| 4'b0000 | Strongly-Ordered device |
| 4'b0001 | Device, bufferable |
| 4'b0010 | Normal Non-cacheable Non-buff. |
| 4'b0011 | Normal Non-cacheable Bufferable |
| 4'b1111 | Normal Cacheable WB WA RA |
The MMU page table attributes on Armv8 map directly to AxCACHE when the CPU issues the transaction.
AxPROT[2:0] = 3 attribute bits on every transaction:
AxPROT[3]=NSE so a 4-way security state (Root/Realm/Secure/Non-Secure) can be encoded.AxPROT is produced at the CPU (based on the MMU / EL) and must propagate unaltered through bridges, crossbars, and async FIFOs to the slave. Losing it is a security bug.
AMBA 5 AXI adds SMMU-side-band signals so a master's StreamID can travel with the transaction — crucial for virtualised I/O and IOMMUs:
AXI5 AxSTASH signals let a coherent master hint that a transaction's data should be installed into a target cache, not just delivered to the slave. Used for low-latency NIC → CPU producer-consumer.
Between ~2012 and ~2018 IP vendors (especially on FPGA, via Xilinx) switched register interfaces from APB to AXI4-Lite. Reasons:
TDATA — data (any width)TVALID, TREADY — VALID/READY handshakeTLAST — end of packetTKEEP — per-byte "this byte is valid"TSTRB — per-byte "this byte is a data byte (vs null)"TID, TDEST, TUSER — routing & metadataDSP pipelines don't address memory — they're just chains of "process one beat, hand it to the next stage". AXI4-Stream is that interface standardised.
Because it's so light, you can clock-gate entire stages based on TVALID activity.
Lite and Stream got AMBA 5 refreshes too — mostly adding security attributes (NSE) and user-signal alignment with the rest of AMBA 5.
Source must not wait for READY before asserting VALID. Creating that combinational loop deadlocks the bus and breaks STA.
Once VALID is asserted, the payload (ADDR, DATA, etc.) must not change until READY takes effect.
RLAST must be high on exactly the last beat. Generating it too early or too late confuses the master's burst counter.
Protocol-illegal. Split at the master or have the slave error out.
Slave must wait for WLAST before issuing B. Some slaves combine this with an internal write buffer; you still must respect the ordering.
If Master A is waiting on Slave X's B channel while A holds up Slave Y's AR queue that X in turn waits on, the bus deadlocks. Topology and depth of outstanding transactions must be analysed.
Crossbars widen AxID by log2(N_masters). If downstream slaves hard-code the ID width, the crossbar break-out fails.
CPU's MMU attributes must match what the CPU drives on AxCACHE, or coherency contracts break.
| Feature | AXI3 | AXI4 | AXI5 |
|---|---|---|---|
| INCR burst len | 1–16 | 1–256 | 1–256 |
| Write interleave | yes | no | no |
| QoS / Region | no | yes | yes |
| USER sideband | no | yes | yes |
| Atomics | Exclusive only | Exclusive only | Atomic* |
| Cache stashing | no | no | yes |
| RME / MTE hooks | no | no | yes |
No registers at all: just pass VALID/DATA/LAST forward and READY backward. Zero latency, zero area. A perfect DV sanity-check fabric for any Stream DUT.
module axis_loopback #(parameter W = 64) (
// slave (sink-side) port
input logic s_tvalid,
output logic s_tready,
input logic [W-1:0] s_tdata,
input logic s_tlast,
// master (source-side) port
output logic m_tvalid,
input logic m_tready,
output logic [W-1:0] m_tdata,
output logic m_tlast
);
assign m_tvalid = s_tvalid;
assign m_tdata = s_tdata;
assign m_tlast = s_tlast;
assign s_tready = m_tready; // back-pressure pass-through
endmodule
Because AXI4-Stream has no address, no ID, and no response channel, a pure combinational pass-through is legal & protocol-compliant.
One flop + one "skid" holding register is the minimum that re-times a VALID/READY channel without breaking the protocol's "VALID must not depend combinationally on READY" rule. Drop into any AXI channel to close timing.
module axi_skid #(parameter W = 32) (
input logic clk, rstn,
input logic s_valid, output logic s_ready,
input logic [W-1:0] s_data,
output logic m_valid, input logic m_ready,
output logic [W-1:0] m_data
);
logic [W-1:0] skid_q; logic skid_v;
always_ff @(posedge clk or negedge rstn)
if (!rstn) {skid_v, skid_q} <= '0;
else if (s_valid && s_ready && !m_ready)
{skid_v, skid_q} <= {1'b1, s_data};
else if ( m_ready) skid_v <= 1'b0;
assign m_valid = s_valid | skid_v;
assign m_data = skid_v ? skid_q : s_data;
assign s_ready = !skid_v;
endmodule
This 10-line cell is the backbone of every AXI crossbar, bridge, and clock-domain-crossing FIFO front-end.
module axil_loopback_reg (
input logic ACLK, ARESETn,
// AW / W / B
input logic AWVALID, input logic [31:0] AWADDR,
output logic AWREADY,
input logic WVALID, input logic [31:0] WDATA,
input logic [3:0] WSTRB, output logic WREADY,
output logic BVALID, output logic [1:0] BRESP,
input logic BREADY,
// AR / R
input logic ARVALID, input logic [31:0] ARADDR,
output logic ARREADY,
output logic RVALID, output logic [31:0] RDATA,
output logic [1:0] RRESP, input logic RREADY
);
logic [31:0] reg_q;
logic aw_hs, w_hs, b_pend, ar_hs, r_pend;
// simultaneous AW+W one-beat write handshake
assign AWREADY = !b_pend;
assign WREADY = !b_pend;
assign aw_hs = AWVALID && AWREADY;
assign w_hs = WVALID && WREADY;
always_ff @(posedge ACLK or negedge ARESETn)
if (!ARESETn) {reg_q, b_pend, r_pend} <= '0;
else begin
// write: byte-strobed update
if (aw_hs && w_hs) begin
for (int i = 0; i < 4; i++)
if (WSTRB[i]) reg_q[i*8 +: 8] <= WDATA[i*8 +: 8];
b_pend <= 1'b1;
end else if (BVALID && BREADY) b_pend <= 1'b0;
// read: one beat
if (ARVALID && ARREADY) r_pend <= 1'b1;
else if (RVALID && RREADY) r_pend <= 1'b0;
end
assign ARREADY = !r_pend;
assign BVALID = b_pend; assign BRESP = 2'b00;
assign RVALID = r_pend; assign RDATA = reg_q; assign RRESP = 2'b00;
endmodule
Read returns whatever was last written — so this slave doubles as a write-read loopback for DV. Point any master's AXI test at it and bounce data off. No memory model needed.
Arm Ltd. — AMBA AXI and ACE Protocol Specification (IHI 0022) — AXI3, AXI4, AXI5
Arm Ltd. — AMBA AXI-Stream Protocol Specification (IHI 0051)
Arm Ltd. — AMBA AXI4-Lite FAQ and errata (Arm Developer website)
Arm Ltd. — Neoverse N2 / V2 Technical Reference Manual — AXI5 port definitions, atomic transactions
Xilinx UG761 / UG1037 — AXI Reference Guide — practical AXI4 / AXI4-Lite / AXI4-Stream examples
Xilinx PG059 — AXI Interconnect v2.1 — crossbar architecture reference
Siemens EDA / Cadence / Synopsys — AXI protocol checker docs & coverage models
Arm ABVIP — AMBA Protocol Verification IP (AXI, AXI-Stream) — formal property libraries
Clifford Wolf, Dan Gisselquist — "Building AXI Infrastructure" blog series — pragmatic protocol-checker implementation notes
Wikipedia — "Advanced eXtensible Interface" — well-sourced cross-references
Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.