Mathematics for Machine Learning

Book this companion follows Mathematics for Machine Learning Marc Peter Deisenroth, A. Aldo Faisal & Cheng Soon Ong Cambridge University Press · 2020 · ISBN 978-1-108-47004-9 Free PDF: mml-book.github.io

Pedagogical arc

01 — The book's map

The four pillars (regression, dim-reduction, density estimation, classification) and what each chapter contributes to them.

02 – 07 Part I — Foundations

Linear algebra, analytic geometry, matrix decompositions, vector calculus, probability and continuous optimisation — the toolkit Part II will use.

08 – 12 Part II — Four canonical problems

Linear regression as projection, PCA as eigenvectors of the covariance, GMMs trained with EM, and the kernel SVM — built directly from the Part I tools.

Presentations in this series

Part I — Mathematical Foundations

01
Introduction and Motivation →
The four pillars of ML (regression, dim. reduction, density estimation, classification) and the four mathematical foundations (linear algebra, analytic geometry, vector calculus, probability). Interactive map of the book and the dependency graph between chapters.
livech. 1
02
Linear Algebra →
Systems of linear equations, matrices, Gaussian elimination, vector spaces, basis & rank, linear & affine maps. Interactive Gaussian-elimination row stepper that solves $A\mathbf{x}=\mathbf{b}$ live.
livech. 2
03
Analytic Geometry →
Norms, inner products, lengths & distances, angles, orthonormal bases, orthogonal complements, orthogonal projections, rotations. Interactive projection demo with the ⊥ residual drawn live.
livech. 3
04
Matrix Decompositions →
Determinant & trace, eigendecomposition, Cholesky, the singular value decomposition, Eckart–Young low-rank approximation, the matrix-decomposition taxonomy. Interactive SVD image compressor.
livech. 4
05
Vector Calculus →
Partial derivatives, the gradient, the Jacobian, gradients of matrix expressions, the chain rule, backpropagation as the chain rule on a DAG, automatic differentiation, multivariate Taylor series. Interactive gradient-descent visualiser on a 2D loss surface.
livech. 5
06
Probability and Distributions →
Sample spaces, sum & product rules, Bayes' theorem, summary statistics, the Gaussian (with marginalisation and conditioning), conjugacy, the exponential family, change of variables. Interactive Bayes' theorem demo and bivariate Gaussian explorer.
livech. 6
07
Continuous Optimisation →
Gradient descent (with and without momentum), constrained optimisation & Lagrange multipliers, convex sets & functions, linear & quadratic programming, the Lagrange dual. Interactive descent paths on a 2D loss surface with adjustable learning rate and momentum.
livech. 7

Part II — Central Machine Learning Problems

08
When Models Meet Data →
Data & features, empirical risk minimisation, the bias-variance trade-off, regularisation, cross-validation, MLE vs MAP, directed graphical models. Interactive polynomial-fitting demo with live bias/variance and learning-curve readouts.
livech. 8
09
Linear Regression →
Least squares as orthogonal projection onto the column space, ridge / MAP regularisation, Bayesian linear regression with posterior predictive distribution, feature maps. Interactive regression playground that updates the conjugate Gaussian posterior point-by-point.
livech. 9
10
Dimensionality Reduction with PCA →
Variance-maximisation view, reconstruction-error view, eigenvectors of the covariance, low-rank approximation, PCA in high dimensions, probabilistic PCA, the latent-variable perspective. Interactive 2D → 1D PCA visualiser with live principal-axis fitting.
livech. 10
11
Density Estimation with GMMs →
Mixture model likelihood, the EM algorithm derived from the latent-variable view, responsibilities, the lower bound, soft vs hard clustering, model-order selection. Interactive 2D EM animator with step-by-step E and M passes.
livech. 11
12
Classification with SVMs →
Separating hyperplanes & the maximum-margin idea, primal hard-margin SVM, soft-margin slack, the Lagrange dual, support vectors, the kernel trick (linear, polynomial, RBF). Interactive 2D kernel SVM demo where you place points and watch the decision boundary change kernel.
livech. 12

Why this companion? Deisenroth, Faisal & Ong's book sits at a rare intersection: it is honest mathematics — with axioms, theorems and proofs — written explicitly for ML readers. Most companions either re-derive the maths (and add nothing) or paraphrase the ML parts (and lose the rigour). This series instead visualises what the book proves: every abstract definition gets a draggable picture, every theorem gets a live numerical check, and every algorithm of Part II is traced back, slide-by-slide, to a theorem in Part I.

Read the book and the deck side by side — chapter and section numbers in the slides correspond directly to the book, so you can dive deeper into any proof at any time.