Linear Algebra for AI / ML

Pedagogical arc

01 – 04 Foundations

Vectors, matrices, matmul, inner products — the language every later deck reads from.

05 – 09 Structure theorems

Projection, eigendecomposition, SVD, QR and gradients — the why behind transformer shape and gradient flow.

10 – 12 Inside the transformer

Attention, the full block forward pass, and the tensor / sharding view that scales it across pods.

Presentations in this series

Vectors & Vector Spaces →
Vectors, span, basis, linear independence, dimension, coordinates — the geometric foundations behind every embedding. Interactive 2D vector playground.
live
Matrices as Linear Maps →
A matrix is a linear map between coordinate spaces. Column picture, row picture, range & null space, rank, MLP layer = matrix-vector. Interactive 2D map visualiser.
live
Matrix Multiplication Deep-Dive →
Four equivalent views of matmul, block matrices, batched matmul, the arithmetic-intensity argument, why GEMM dominates LLM compute. Interactive matmul step-through.
live
Inner Products, Norms & Geometry →
Dot product, cosine similarity, L1/L2/L∞ norms, orthogonality, Cauchy-Schwarz — the geometry behind embedding similarity and attention scores. Interactive cosine explorer.
live
Projections — Up & Down →
Orthogonal projection, projection matrices, dimensionality lift & collapse. Why transformer FFNs go up by 4× then back down. Interactive projection visualiser plus a working SwiGLU.
live
Eigenvalues & Eigenvectors →
Eigendecomposition, the spectral theorem, power iteration, PCA, why eigenvalues control gradient flow and Jacobian conditioning. Interactive eigenvector animation.
live
SVD, Low-Rank & LoRA →
Singular Value Decomposition, Eckart-Young, truncated SVD, LoRA / DoRA, DeepSeek MLA, low-rank KV-cache compression. Interactive image-rank slider.
live
Orthogonality, QR & RoPE →
Gram-Schmidt, QR factorisation, orthogonal matrices, condition number, weight initialisation, RoPE as a stack of 2×2 rotations. Interactive RoPE visualiser.
live
Gradients, Jacobians & Backprop →
Derivative as the best linear approximation, the Jacobian, the chain rule as matrix product, JVP vs VJP, the linear-algebraic core of autograd. Interactive backprop walker.
live
Attention as Linear Algebra →
Q, K, V as learned projections, softmax-row-stochastic attention matrix, multi-head as block-diagonal projection, masking, scale 1/√d. Interactive attention playground.
live
Transformer Block Anatomy →
Walk a tensor through a full block: pre-norm, multi-head attention with all four projections, residual, FFN up-then-down with SwiGLU, residual. Every shape, every matmul.
live
Tensors, Einsum & Modern Tricks →
From matrices to tensors, einsum notation, batched / strided / sharded GEMMs, FlashAttention as block matmul, MoE as sparse projection, GQA as shared K/V, parameter / data / tensor parallelism.
live

Why a Linear Algebra sub-hub? Most introductions to transformers either skip the linear algebra entirely — "QKV are just learned projections, don't worry about it" — or front-load a semester of pure maths and never quite reconnect to the architecture. This series threads the needle. Every concept is introduced for its own sake (with proof sketches and geometric intuition), then immediately landed inside a real transformer block. By deck 11 you can read a 100M-line model definition as the linear-algebraic object it actually is.