LLM Hub — Transformer Architecture

Transformer Architecture

Decoder-only transformer internals — from a visual walkthrough to RTL and from PyTorch to quantisation.

TransformerDecoderRTLPyTorchQuantisation

Presentations in This Series

  1. Transformer Decoder — Visual Deep Dive →
    Interactive visual walkthrough of every operation inside a decoder-only transformer, from tokenisation to generation.
    live
  2. Transformer Decoder — Every Computation →
    Browser-based forward pass with real tensor values — GQA, RoPE, SwiGLU, RMSNorm, all from scratch in vanilla JS.
    live
  3. Transformer Decoder — RTL Accelerator →
    Synthesisable SystemVerilog implementation of a pre-norm decoder block with KV-cache, plus an 83-test verification suite.
    live
  4. Karpathy's nanoGPT — Step by Step →
    Interactive presentation walking through every line of Karpathy's ~200-line GPT — tokenisation, self-attention, blocks, training, generation.
    live
  5. Transformer Decoder from Scratch →
    Step-by-step PyTorch implementation in a single Jupyter notebook — every component built from first principles and visualised.
    live
  6. Neural Network Data Types →
    SystemVerilog implementations of 9 numerical formats (FP32 down to FP4) used in NN training and inference hardware.
    live
  7. Hardware-Aware Quantisation →
    Interactive quantisation explorer — formats, weight distributions, schemes, simulated inference, hardware cost models, mixed-precision.
    live
  8. AI Matrix Multiplier Units (MMUL) →
    Hardware deep-dive on the MMUL — 24-slide presentation (Kung 1978/82 → TPU → Blackwell, four dataflow architectures, Transformer mapping, FP32 → FP4 / MXFP / NVFP4, real systems, memory hierarchy, power & thermals) plus four parameterised SystemVerilog implementations, Python golden model, 258 passing tests, and Vivado synthesis results.
    live