Thomas Bayes

1701 – 1761 • The Foundations of Inverse Probability

Presbyterian minister, mathematician, and Fellow of the Royal Society whose posthumous essay transformed the science of reasoning under uncertainty.

Probability Statistics Theology Inverse Problems
01 — ORIGINS

Early Life

Thomas Bayes was born in 1701 in Hertfordshire, England, into a prominent Nonconformist family. His father, Joshua Bayes, was one of the first six Nonconformist ministers to be publicly ordained in England, a move that signalled the family's deep roots in dissenting Protestantism.

Because Nonconformists were barred from attending Oxford and Cambridge, Bayes was educated at the University of Edinburgh, where he studied logic and theology. He matriculated around 1719 and likely encountered the mathematical ideas of the Scottish Enlightenment during his studies there.

After completing his education, Bayes became a Presbyterian minister, initially assisting his father at the chapel in Leather Lane, London, before taking his own ministry in Tunbridge Wells, Kent, around 1734.

Nonconformist Heritage

Dissenting Protestants faced legal restrictions under the Act of Uniformity (1662), shaping Bayes's education and career path outside the Anglican establishment.

Edinburgh Education

Scotland's universities were open to Dissenters and offered a rigorous curriculum in natural philosophy, logic, and mathematics.

Tunbridge Wells

Bayes served as minister at the Mount Sion chapel from c.1734 until his retirement, living a quiet life of intellectual inquiry.

02 — CAREER

Career & Key Moments

Bayes published little during his lifetime. His first known work was Divine Benevolence (1731), a theological tract defending God's goodness. In 1736, he anonymously published An Introduction to the Doctrine of Fluxions, defending Newton's calculus against the attacks of Bishop George Berkeley in The Analyst.

This mathematical defence was significant enough to earn Bayes election as a Fellow of the Royal Society in 1742, despite having published no mathematical work under his own name. The nomination was supported by prominent members who recognised his abilities.

Bayes likely worked on his probability essay throughout the 1740s and 1750s, but never published it. After his death on 7 April 1761, his friend Richard Price discovered the manuscript among his papers, edited it, and communicated it to the Royal Society in 1763.

1731

Divine Benevolence published — a work of theology and moral philosophy.

1736

Anonymous defence of Newtonian calculus against Berkeley's philosophical critique.

1742

Elected Fellow of the Royal Society — recognition of his mathematical talent.

1763

Richard Price publishes "An Essay towards solving a Problem in the Doctrine of Chances" posthumously.

03 — CONTEXT

Historical Context

Bayes worked during the golden age of classical probability, a period sparked by the correspondence of Pascal and Fermat (1654) and formalised by Jacob Bernoulli's Ars Conjectandi (1713) and Abraham de Moivre's The Doctrine of Chances (1718).

The Forward Problem

Early probabilists solved the "forward" problem: given known odds (e.g., a fair die), what is the probability of observed outcomes? Bernoulli, de Moivre, and others built the combinatorial foundations.

The Inverse Problem

Bayes tackled the far harder "inverse" question: given observed data, what can we infer about the underlying cause or probability? This inversion is the heart of his essay.

Enlightenment Thought

The 18th century valued rational inquiry and evidence. Bayes's work aligned with the broader project of using mathematics to reason about the uncertain, observable world.

"Given the number of times in which an unknown event has happened and failed: Required the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named."

— Thomas Bayes, opening problem statement of the Essay (1763)
04 — CORE CONTRIBUTION

Bayes' Theorem

The central result: P(H|D) = P(D|H) · P(H) / P(D). The posterior probability of a hypothesis H given data D equals the likelihood times the prior, normalised by the evidence.

Total Probability Space P(S) = 1 P(H) = Prior Belief before data P(not H) P(D) = Evidence P(D ∩ H) = P(D|H) · P(H) Likelihood x Prior P(D ∩ ¬H) Posterior: P(H|D) = P(D ∩ H) / P(D) = P(D|H) · P(H) / P(D)

Area diagram: the posterior is the fraction of the evidence region that falls within the hypothesis region.

05 — DEEPER DIVE

Unpacking Inverse Probability

Bayes's essay posed a thought experiment: imagine a ball thrown onto a flat table, landing at some unknown position. A second ball is then thrown multiple times. Each time we are told only whether it landed to the left or right of the first ball. From these binary observations, can we infer the position of the first ball?

This billiard-table setup is a brilliant model for inverse probability. The unknown position corresponds to an unknown probability parameter θ. The observations (left/right) are the data. Bayes showed that, given a uniform prior on θ, the posterior distribution is what we now call a Beta distribution: Beta(α + 1, β + 1) where α is successes and β is failures.

Crucially, Bayes did not just state the theorem — he derived the posterior distribution for a binomial parameter, performing the integration needed to normalise it. Richard Price added a philosophical preface arguing the result supported the existence of divine order.

Uniform Prior

Bayes assumed the prior probability of θ was uniformly distributed on [0, 1], treating all values as equally likely before observation — an assumption later debated extensively.

Beta Posterior

The result is what we now call conjugate analysis: a Beta prior combined with binomial data yields a Beta posterior, enabling exact computation.

Richard Price's Role

Price did far more than edit — he added worked examples, a philosophical introduction, and an appendix extending Bayes's calculations. Some historians argue Price deserves co-credit for "Bayes' theorem."

06 — CORE CONTRIBUTION

Prior to Posterior: Bayesian Updating

The power of Bayes' theorem lies in iterative updating. Each new piece of evidence transforms the posterior, which then becomes the prior for the next observation.

θ (probability parameter) Density 0 0.5 1 Prior (uniform) After 3/5 successes After 12/20 successes After 60/100 successes 0.6

As data accumulates, the posterior concentrates around the true parameter value (here, θ = 0.6).

07 — DEEPER DIVE

The Mechanics of Belief Revision

Bayesian updating provides a principled, coherent framework for revising beliefs in light of new evidence. The process is sequential and consistent: the order in which data arrives does not change the final posterior, only the total evidence matters.

Key properties of Bayesian updating:

  • Coherence: A Bayesian agent cannot be "Dutch-booked" — no series of bets can guarantee a loss, because Bayesian probabilities satisfy the axioms of probability.
  • Convergence: Under mild conditions, as data accumulates, Bayesian posteriors converge to the truth regardless of the prior (Doob's theorem). The prior "washes out."
  • Subjectivity of priors: Different agents may start with different priors, but with enough shared data, their posteriors will converge — a powerful form of objectivity emerging from subjectivity.

The Update Process

1. Start with Prior P(H)

Initial belief about hypothesis before seeing data

2. Observe Data D

Compute likelihood P(D|H) for each hypothesis

3. Apply Bayes' Rule

Multiply prior by likelihood, normalise by P(D)

4. Posterior becomes new Prior

Ready for the next observation — iterate

08 — CORE CONTRIBUTION

The Problem of Induction

Bayes's work directly addresses the philosophical problem of induction: how can we reason from specific observations to general conclusions? David Hume, Bayes's contemporary, had argued in his Treatise of Human Nature (1739) that induction had no rational justification.

Bayes's theorem offers a mathematical response: while we cannot achieve certainty from finite observations, we can quantify our degree of rational belief. The posterior probability provides an exact measure of how much the evidence supports a hypothesis.

Richard Price explicitly connected Bayes's mathematics to the problem of induction in his preface to the essay. Price argued that the result showed it was possible to reason rationally about causes from effects — a direct answer to Hume's scepticism about causation.

This philosophical dimension distinguishes Bayes's contribution from mere mathematical technique. It proposed that probability itself is the logic of uncertain reasoning, an idea that would be fully developed two centuries later by Harold Jeffreys, Bruno de Finetti, and E.T. Jaynes.

Hume's Challenge

No number of observations can logically prove a universal law. The sun rising every day does not prove it will rise tomorrow.

Bayes's Response

We cannot prove certainty, but we can calculate that after n consecutive sunrises, P(sunrise tomorrow) = (n+1)/(n+2) — Laplace's later "rule of succession" derived from Bayesian reasoning.

Legacy

Bayesian epistemology is now a major school of thought in philosophy of science, formalising how evidence should change belief.

09 — METHOD

Bayes's Method of Reasoning

Bayes's approach was characteristically Georgian English in its modesty and rigour. He worked from a concrete, physical model (the billiard table) rather than abstract axioms, building intuition before formalism.

His method involved:

  • Geometric probability: Mapping probability to physical area, making continuous distributions tangible.
  • Careful integration: Computing the normalising constant by integrating the likelihood over all possible parameter values — what we now call the marginal likelihood.
  • Thought experiments: Using the billiard table as a device to make the abstract problem concrete and the assumptions transparent.

Bayes was meticulous about stating his assumptions. His use of a uniform prior was explicit, not hidden, allowing later thinkers to debate and refine this choice. This transparency about assumptions is a hallmark of good Bayesian practice to this day.

"The problem is not about what we know, but about what we can rationally infer from what we observe."

— Paraphrase of Bayes's motivating question

Geometric Intuition

By placing the problem on a table, Bayes made the integral over the parameter space literally an area calculation — elegant and accessible.

Principled Assumptions

Bayes's uniform prior assumes maximal ignorance. Later, Laplace independently derived the same approach, calling it the "principle of insufficient reason."

10 — CONNECTIONS

Connections & Collaborations

Thomas Bayes Richard Price Editor & publisher Abraham de Moivre Influenced by Pierre-Simon Laplace Independent rediscovery David Hume Philosophical foil Newton Defended

Bayes sat at a nexus of probability, philosophy, and theology. Price transmitted his ideas; Laplace independently developed and extended them; Hume posed the philosophical challenge Bayes answered.

11 — CONTROVERSY

The Great Debate: Bayesian vs. Frequentist

The interpretation of probability spawned one of the most enduring debates in the history of science. Bayesians treat probability as a degree of belief, updated by evidence. Frequentists treat probability as a long-run frequency of events, rejecting the idea that unknown parameters have probability distributions.

In the early 20th century, Ronald Fisher, Jerzy Neyman, and Egon Pearson developed frequentist methods (p-values, confidence intervals, hypothesis testing) that dominated statistics for decades. They regarded Bayesian priors as subjective and unscientific.

The tide began turning in the late 20th century as computational advances (especially Markov Chain Monte Carlo methods in the 1990s) made Bayesian calculations practical for complex models. Today, both paradigms coexist, but Bayesian methods are increasingly dominant in machine learning, genomics, and artificial intelligence.

Frequentist Critique

"Where do priors come from? If two scientists choose different priors, they get different answers from the same data. This is not objective science."

Bayesian Response

"All statistical methods make assumptions. At least Bayesian priors are stated explicitly, not hidden in the choice of test statistic or significance level."

Modern Synthesis

Many modern statisticians are pragmatic, using whichever framework best suits the problem. Bayesian methods excel when priors are informative and models are complex.

12 — LEGACY

Legacy in Modern Mathematics

Bayesian Statistics

A complete framework for statistical inference: prior specification, likelihood construction, posterior computation, and decision theory. Used in clinical trials, A/B testing, and actuarial science.

Machine Learning & AI

Bayesian neural networks, Gaussian processes, probabilistic graphical models, and variational inference all descend from Bayes's core idea. Modern large language models use Bayesian principles in training.

Bayesian Epistemology

A formal philosophy of science where rational belief change follows Bayes's rule. Influential in philosophy departments worldwide.

Signal Processing

The Kalman filter (1960), used in spacecraft navigation, GPS, and autonomous vehicles, is a real-time Bayesian updater for noisy measurements.

"Bayesian inference is the logic of science."

— E.T. Jaynes, Probability Theory: The Logic of Science (2003)
13 — APPLICATIONS

Applications in Science & Engineering

Medical Diagnosis

Bayes' theorem is fundamental to diagnostic testing. Given test sensitivity, specificity, and disease prevalence, it computes the probability a patient truly has the disease after a positive test.

Spam Filtering

Naive Bayes classifiers, introduced by Sahami et al. (1998), revolutionised email filtering. They compute P(spam | words) using Bayes' theorem over word frequencies.

Cryptanalysis

Alan Turing used Bayesian reasoning (he called it "Banburismus") to break the Enigma cipher in WWII, calculating posterior probabilities of rotor settings.

Climate Science

Bayesian model averaging combines predictions from multiple climate models, weighting each by how well it fits historical data.

Forensic Science

DNA profile matching uses Bayesian likelihood ratios to quantify the strength of evidence connecting a suspect to a crime scene.

Search & Rescue

The US Coast Guard uses Bayesian search theory (SAROPS) to optimally allocate search resources, updating the probability map of a missing vessel's location as new data arrives.

14 — TIMELINE

Life & Legacy Timeline

1701 Born in Hertfordshire, England c.1719 Matriculates at University of Edinburgh 1731 Publishes Divine Benevolence c.1734 Becomes minister at Tunbridge Wells 1736 Anonymously defends Newton's calculus 1742 Elected Fellow of the Royal Society 1761 Dies 7 April; buried in Bunhill Fields, London 1763 Essay published posthumously by Richard Price 1774 Laplace independently derives Bayes' theorem 1812 Laplace's Théorie analytique fully develops Bayesian methods
15 — READING

Recommended Reading

The Theory That Would Not Die

Sharon Bertsch McGrayne (2011). A brilliant narrative history of Bayes' theorem from 1740 to the 21st century, covering its suppression, rediscovery, and triumph. Accessible to general readers.

Bayesian Data Analysis

Andrew Gelman et al. (3rd ed., 2013). The standard graduate textbook on Bayesian statistics. Rigorous yet practical, with extensive worked examples and computational guidance.

Probability Theory: The Logic of Science

E.T. Jaynes (2003). A passionate, opinionated defence of Bayesian probability as the foundation of all scientific reasoning. Intellectually demanding but deeply rewarding.

An Essay towards solving a Problem in the Doctrine of Chances

Thomas Bayes (1763). The original essay, readable and surprisingly modern. Available freely online through the Royal Society archives.

Bernoulli's Fallacy

Aubrey Clayton (2021). A modern philosophical account of why frequentist statistics went wrong and how Bayesian thinking corrects deep conceptual errors in science.

Statistical Rethinking

Richard McElreath (2nd ed., 2020). A practical introduction to Bayesian statistics using R and Stan, emphasising causal reasoning and model building. Excellent for scientists.

"The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the value of the thing expected upon its happening."

— Thomas Bayes, An Essay towards solving a Problem in the Doctrine of Chances (1763)

Thomas Bayes • 1701–1761 • The quiet minister whose posthumous essay became the foundation of modern inference.