LLM Hub — Linear Algebra for AI / ML

Linear Algebra for AI / ML

A twelve-deck arc from vectors and matrices to the SVD and tensor algebra — every idea landed inside a real transformer. Up-projection & down-projection, attention as a stochastic matrix, LoRA, RoPE rotations, FlashAttention as a tiled GEMM, and the einsum view of large-model parallelism.

VectorsMatmul ProjectionEigen SVDLoRA MLARoPE AttentionFFN FlashAttentionEinsum

Pedagogical arc

01 – 04  Foundations

Vectors, matrices, matmul, inner products — the language every later deck reads from.

05 – 09  Structure theorems

Projection, eigendecomposition, SVD, QR and gradients — the why behind transformer shape and gradient flow.

10 – 12  Inside the transformer

Attention, the full block forward pass, and the tensor / sharding view that scales it across pods.

Presentations in this series

  1. Vectors & Vector Spaces →
    Vectors, span, basis, linear independence, dimension, coordinates — the geometric foundations behind every embedding. Interactive 2D vector playground.
    live
  2. Matrices as Linear Maps →
    A matrix is a linear map between coordinate spaces. Column picture, row picture, range & null space, rank, MLP layer = matrix-vector. Interactive 2D map visualiser.
    live
  3. Matrix Multiplication Deep-Dive →
    Four equivalent views of matmul, block matrices, batched matmul, the arithmetic-intensity argument, why GEMM dominates LLM compute. Interactive matmul step-through.
    live
  4. Inner Products, Norms & Geometry →
    Dot product, cosine similarity, L1/L2/L∞ norms, orthogonality, Cauchy-Schwarz — the geometry behind embedding similarity and attention scores. Interactive cosine explorer.
    live
  5. Projections — Up & Down →
    Orthogonal projection, projection matrices, dimensionality lift & collapse. Why transformer FFNs go up by 4× then back down. Interactive projection visualiser plus a working SwiGLU.
    live
  6. Eigenvalues & Eigenvectors →
    Eigendecomposition, the spectral theorem, power iteration, PCA, why eigenvalues control gradient flow and Jacobian conditioning. Interactive eigenvector animation.
    live
  7. SVD, Low-Rank & LoRA →
    Singular Value Decomposition, Eckart-Young, truncated SVD, LoRA / DoRA, DeepSeek MLA, low-rank KV-cache compression. Interactive image-rank slider.
    live
  8. Orthogonality, QR & RoPE →
    Gram-Schmidt, QR factorisation, orthogonal matrices, condition number, weight initialisation, RoPE as a stack of 2×2 rotations. Interactive RoPE visualiser.
    live
  9. Gradients, Jacobians & Backprop →
    Derivative as the best linear approximation, the Jacobian, the chain rule as matrix product, JVP vs VJP, the linear-algebraic core of autograd. Interactive backprop walker.
    live
  10. Attention as Linear Algebra →
    Q, K, V as learned projections, softmax-row-stochastic attention matrix, multi-head as block-diagonal projection, masking, scale 1/√d. Interactive attention playground.
    live
  11. Transformer Block Anatomy →
    Walk a tensor through a full block: pre-norm, multi-head attention with all four projections, residual, FFN up-then-down with SwiGLU, residual. Every shape, every matmul.
    live
  12. Tensors, Einsum & Modern Tricks →
    From matrices to tensors, einsum notation, batched / strided / sharded GEMMs, FlashAttention as block matmul, MoE as sparse projection, GQA as shared K/V, parameter / data / tensor parallelism.
    live

Why a Linear Algebra sub-hub? Most introductions to transformers either skip the linear algebra entirely — "QKV are just learned projections, don't worry about it" — or front-load a semester of pure maths and never quite reconnect to the architecture. This series threads the needle. Every concept is introduced for its own sake (with proof sketches and geometric intuition), then immediately landed inside a real transformer block. By deck 11 you can read a 100M-line model definition as the linear-algebraic object it actually is.