Transformer Architecture

Presentations in This Series

Transformer Decoder — Visual Deep Dive →
Interactive visual walkthrough of every operation inside a decoder-only transformer, from tokenisation to generation.
live
Transformer Decoder — Every Computation →
Browser-based forward pass with real tensor values — GQA, RoPE, SwiGLU, RMSNorm, all from scratch in vanilla JS.
live
Transformer Decoder — RTL Accelerator →
Synthesisable SystemVerilog implementation of a pre-norm decoder block with KV-cache, plus an 83-test verification suite.
live
Karpathy's nanoGPT — Step by Step →
Interactive presentation walking through every line of Karpathy's ~200-line GPT — tokenisation, self-attention, blocks, training, generation.
live
Transformer Decoder from Scratch →
Step-by-step PyTorch implementation in a single Jupyter notebook — every component built from first principles and visualised.
live
Neural Network Data Types →
SystemVerilog implementations of 9 numerical formats (FP32 down to FP4) used in NN training and inference hardware.
live
Hardware-Aware Quantisation →
Interactive quantisation explorer — formats, weight distributions, schemes, simulated inference, hardware cost models, mixed-precision.
live
AI Matrix Multiplier Units (MMUL) →
Hardware deep-dive on the MMUL — 24-slide presentation (Kung 1978/82 → TPU → Blackwell, four dataflow architectures, Transformer mapping, FP32 → FP4 / MXFP / NVFP4, real systems, memory hierarchy, power & thermals) plus four parameterised SystemVerilog implementations, Python golden model, 258 passing tests, and Vivado synthesis results.
live