Presentations in This Series
- Transformer Decoder — Visual Deep Dive →Interactive visual walkthrough of every operation inside a decoder-only transformer, from tokenisation to generation.
- Transformer Decoder — Every Computation →Browser-based forward pass with real tensor values — GQA, RoPE, SwiGLU, RMSNorm, all from scratch in vanilla JS.
- Transformer Decoder — RTL Accelerator →Synthesisable SystemVerilog implementation of a pre-norm decoder block with KV-cache, plus an 83-test verification suite.
- Karpathy's nanoGPT — Step by Step →Interactive presentation walking through every line of Karpathy's ~200-line GPT — tokenisation, self-attention, blocks, training, generation.
- Transformer Decoder from Scratch →Step-by-step PyTorch implementation in a single Jupyter notebook — every component built from first principles and visualised.
- Neural Network Data Types →SystemVerilog implementations of 9 numerical formats (FP32 down to FP4) used in NN training and inference hardware.
- Hardware-Aware Quantisation →Interactive quantisation explorer — formats, weight distributions, schemes, simulated inference, hardware cost models, mixed-precision.
- AI Matrix Multiplier Units (MMUL) →Hardware deep-dive on the MMUL — 24-slide presentation (Kung 1978/82 → TPU → Blackwell, four dataflow architectures, Transformer mapping, FP32 → FP4 / MXFP / NVFP4, real systems, memory hierarchy, power & thermals) plus four parameterised SystemVerilog implementations, Python golden model, 258 passing tests, and Vivado synthesis results.