LLM History Series — Presentation 06

Anthropic — The Safety-First Frontier Lab

Founded in 2021 by people who left OpenAI because they thought safety should be the centre of the work, not an adjacent function. Five years on, it is one of three western frontier labs and runs the most-trusted assistant in the developer market. The story of how that combination came to be.

2021–2026 Dario AmodeiDaniela Amodei Constitutional AIRSP Claude 1…4.7Mech Interp
Founding (2021) Constitutional AI (2022) Claude 1 (2023) RSP (Sep 2023) Computer Use (Oct 2024) Claude 4.x (2025–26)
00

What This Deck Covers

Anthropic as an organisation: the founding bargain, the people, the bets, the structure, and the ongoing argument about whether the bargain holds. Like the OpenAI deck, the technical content of Claude itself is in the architecture and safety sub-hubs of the LLMs site; this is the lab as an institution.

01

The Founding (2021) — Why People Left OpenAI

Anthropic was founded in early 2021 (incorporated in February, announced publicly later that year) by a group of senior researchers who had left OpenAI within a few months of each other in late 2020 and early 2021. The lab opened in San Francisco's SoMa district with around seven people; it announced a $124 M Series A in May 2021.

The founders' public account of the departure has been consistent: a disagreement about how directly safety research should drive frontier deployment decisions. Not a hostile split — the Amodeis have spoken positively of OpenAI in subsequent interviews — but a genuine difference in emphasis large enough to justify a new institution.

We thought it was important to have an organisation that was focused first and foremost on safety, and where the safety researchers had a say in deployment decisions. — Dario Amodei, in multiple interviews after the founding (paraphrased; his framings vary slightly across appearances).

The pre-Anthropic context

Several of the founders had been the most senior alignment-focused researchers at OpenAI. Dario Amodei was VP of Research; his sister Daniela was head of operations; Tom Brown had led the GPT-3 effort; Sam McCandlish had co-authored the Scaling Laws paper; Jared Kaplan had been the lead author on it; Chris Olah had been doing distill.pub-style mechanistic interpretability work that was pulling in a different direction from the rest of OpenAI's research portfolio.

A clean characterisation

OpenAI had multiple research lines, several of which were the alignment line. Anthropic was an attempt to start a frontier lab where the alignment line was the through-line of the whole organisation rather than one of several. The argument is whether this is actually possible — whether you can stay at the frontier and let safety constrain decisions in ways the rest of the field will not. The deck's thesis is that, on the evidence so far, the answer is conditional yes.

02

The Amodeis and the Six Co-Founders

DA

Dario Amodei — co-founder, CEO

Princeton (BA), Princeton (PhD biophysics) → Stanford postdoc → Baidu → OpenAI VP Research → Anthropic

Italian-American. Trained as a computational biophysicist (his PhD work on retinal neural networks shows up in his AI thinking even today). Joined OpenAI in 2016, ran much of the GPT effort and the safety research line. Public face of the company. Famously thoughtful in interviews; his Machines of Loving Grace essay (2024) is one of the more carefully reasoned visions of beneficial AI from a frontier-lab CEO. Personally serious about catastrophic risk and equally serious about building the technology anyway, on the argument that someone will and it had better be people who care.

DA

Daniela Amodei — co-founder, President

UC Santa Cruz → political organising → Stripe (head of risk and operations) → OpenAI VP Operations → Anthropic

Worked in political organising and at Stripe before OpenAI; her operational background is unusually diverse for a frontier-lab co-founder. Runs the company day-to-day — org structure, policy, public-engagement. The pairing with Dario is genuinely complementary: she is the operational and external lead, he is the research and strategic lead.

The other six co-founders

Tom Brown

OpenAI lead author on GPT-3. The senior pretraining engineer at Anthropic.

Sam McCandlish

Theoretical physicist (Stanford). Co-author Scaling Laws. Senior research lead.

Jared Kaplan

Theoretical physicist, Johns Hopkins faculty (now on leave). Lead author on Scaling Laws. Anthropic Chief Scientist for several years.

Jack Clark

Former AI policy lead at OpenAI; UK government adviser. Heads Policy at Anthropic. Author of the long-running Import AI newsletter.

Chris Olah

Self-taught researcher; ran the original distill.pub. Sets the direction of the mechanistic-interpretability programme. One of the most distinctive scientific voices in the lab.

Jared Mueller, Tom Henighan, others

Senior research / engineering co-founders (the founder list of "around eight" varies depending on whether you count the very-earliest hires). The lab's culture is unusually flat for a research org of its size.

Several key joiners after founding

Jan Leike joined from OpenAI in 2024 after the November 2023 board crisis. John Schulman followed in late 2024. Mike Lewis from FAIR. Daniel Kokotajlo from OpenAI's governance team. The "Anthropic is the place ex-OpenAI alignment people go" pattern that was implicit in the founding has become explicit by 2025.

03

The PBC Structure and the Long-Term Benefit Trust

Anthropic incorporated in 2021 as a Public Benefit Corporation (PBC) in Delaware. PBCs are for-profit but are required to balance shareholder returns against a stated public benefit, in Anthropic's case "the responsible development and maintenance of advanced AI for the long-term benefit of humanity". PBC directors have legal cover to consider mission alongside profit.

In September 2023 the lab introduced a second governance instrument, the Long-Term Benefit Trust:

How the LTBT works

  • An independent body of five trustees who do not hold equity in Anthropic.
  • Has the right to elect a majority of the Anthropic board over time.
  • Trustees rotate; the trust's purpose is the same long-term-benefit mission.
  • Designed to outlast individual founders, executives or large shareholders.

What it is meant to prevent

  • A repeat of the OpenAI structural collapse where the non-profit board could not enforce its will.
  • Acquisition pressure from a future strategic investor.
  • Founders themselves drifting from the mission once vested.
  • Standard public-company shareholder primacy if the company eventually IPOs.
Honest accounting

The PBC + LTBT structure is an attempt to do better than OpenAI's hybrid form, but it is untested under stress. It has not yet been tested in a situation where the company's commercial interests and the LTBT's mission view diverged sharply. Anthropic itself describes the structure as experimental, and the trustees have publicly said the same. The actual durability of governance constraints is something we will only know if and when stress applies.

04

Constitutional AI — the Founding Technical Bet

The first major technical paper out of Anthropic was Constitutional AI: Harmlessness from AI Feedback (Bai et al, December 2022). It sets the alignment-research orientation of the lab.

The idea in two paragraphs

RLHF as practised at OpenAI in 2022 trained a reward model from human comparisons of outputs. Constitutional AI replaces (or supplements) the human comparisons with AI-generated comparisons, where another model evaluates outputs against an explicit set of principles — the "constitution".

The constitution is a list of plain-English principles ("don't help with illegal activities", "explain your reasoning", "avoid stereotypes", etc.). The model critiques and revises its own outputs against these principles, and the revised outputs become training signal. Far more scalable than human labelling, and more transparent than an opaque reward model.

Why it mattered for the field

  • Demonstrated alignment-research-as-product engine: a paper from the safety team becomes the core technique behind shipping a model.
  • Made the values built into a model legible — you can read them.
  • Set the template for RLAIF (RL from AI Feedback) at every other lab.

Why it mattered for Anthropic

  • Clear technical identity from day one.
  • The Constitution is published and editable; reflects Anthropic's stated values explicitly.
  • Underwrites the lab's external-trust posture: here is how we make Claude behave, and here are the principles, and you can read them.
The constitutional method continues

Successive Claude versions have refined and expanded the Constitution. The 2024 release of Claude's Constitution as a public artefact is one of the few explicit alignment-recipe disclosures from any frontier lab. The Constitution is also the technical mechanism behind Claude's character — the warmer, more deliberate prose style that Claude users tend to recognise as distinct from GPT-4o or Gemini. It is intentional.

05

The Claude Line (1 → 4.7)

Anthropic's flagship product line is the Claude family of assistants. Initial release in early 2023, at which point ChatGPT was three months old.

DateModelWhat it added
Mar 2023Claude 1Constitutional AI in production. Strong refusal handling. 100k context window from Claude 1.3.
Jul 2023Claude 2200k context. Stronger reasoning. First broad enterprise contracts.
Mar 2024Claude 3 (Haiku, Sonnet, Opus)Three-tier line. Opus competitive with GPT-4. Vision capability native.
Jun 2024Claude 3.5 SonnetStep jump on coding benchmarks (HumanEval, SWE-bench). Becomes the default coding model for many teams.
Oct 2024Claude 3.5 Sonnet (new) + Computer UseFirst production GUI-controlling agent from a frontier lab.
2025Claude 3.7, Claude 4 (Opus / Sonnet / Haiku), extended thinkingTest-time-compute integration. Claude becomes one of two top-tier coding models with GPT-5.
Late 2025 → 2026Claude 4.5, 4.6, 4.7 with 1 M-token contextLong-context generally available; agent-style use becomes default for developer audiences.

The product strategic posture

Claude has been positioned more strongly toward developer and enterprise users than toward consumer chat. ChatGPT outweighs Claude.ai for consumer subscribers; the API and Claude Code dominate developer tooling and a meaningful share of enterprise contracts. The two labs' product strategies have diverged on this axis since 2023.

Claude Code

Claude Code — the CLI product launched in early 2025 — is one of the more strategically significant Anthropic moves. It positions Claude as a first-class agent for software-engineering work, with terminal access, file-system tools, and a dedicated SDK. By late 2025 it had become a meaningful enterprise revenue line in its own right, and the focal point of much of Anthropic's product investment.

06

Responsible Scaling Policy & the ASL Framework

In September 2023 Anthropic published the first version of its Responsible Scaling Policy, a public framework defining capability thresholds at which the lab commits to add specific safety measures or pause development. It was the first such commitment from a frontier lab. Major revisions followed in October 2024 and 2025.

The AI Safety Levels (ASL)

ASL-2 (current Claude)

Models that show early signs of dangerous capabilities — can be adversarially probed, can produce biology/cyber/chemistry information that is concerning, but no measurable uplift over a determined search-engine user. Standard responsible disclosure, model cards, deployment monitoring.

ASL-3 (anticipated for some 2025/26 models)

Materially uplift a malicious actor on a CBRN threat or autonomous-replication threat. Triggers stronger evaluations, deployment safeguards (jailbreak resistance benchmarks), red-team requirements, and limits on weights distribution.

ASL-4 (not yet reached)

Models capable of substantially autonomously executing tasks that would otherwise require nation-state-level resources. Requires hardened security (intent: defend against well-resourced adversaries including state actors trying to exfiltrate weights), independent red-team approval to deploy, and multi-stakeholder oversight.

ASL-5 and above

Speculative; reserved for models substantially exceeding human-expert capability across most domains. Trigger thresholds and required mitigations are being researched rather than committed to.

Why this is structurally important

The RSP is one of the few publicly verifiable commitments any frontier lab has made about its own behaviour. The capabilities thresholds are specific; the required mitigations are specific; the lab has bound itself in writing to actions if those thresholds are reached. Critics have noted that thresholds are still set by Anthropic itself, that "uplift" is hard to operationalise, and that "pause" has never been tested. Defenders note that no other frontier lab has even gone this far publicly.

07

Mechanistic Interpretability — the Anthropic Programme

Mechanistic interpretability is the attempt to reverse-engineer trained neural networks at the level of individual circuits and features — to understand what computation is happening, not just whether a model gives the right answer. Chris Olah has been the central figure in this since the late 2010s, originally at OpenAI, then at Anthropic from the founding.

The Anthropic interp programme in stages

YearResultWhat it showed
2021A Mathematical Framework for Transformer CircuitsDefines the basic building blocks for analysing attention-based circuits.
2022In-Context Learning and Induction HeadsIdentifies a specific circuit (induction heads) that mechanistically implements in-context learning.
2023Toy Models of SuperpositionExplains why circuits are hard to identify: features are stored in superposition.
2024Sparse autoencoders / dictionary learningPractical recipe for extracting interpretable features from production models. Scaling Monosemanticity applies this to Claude 3 Sonnet.
2025Circuit-level intervention experimentsReading and editing model behaviour through interpretable features. The first plausibly safety-relevant interpretability tooling.
Mechanistic interpretability is the only way I know to detect a deceptively-aligned model before it actually deceives you. — Chris Olah, paraphrased from multiple talks. The thesis is that behavioural testing is insufficient because a sufficiently capable model can simulate aligned behaviour during testing.
Why interp is the company's flagship safety-research line

Most alignment work is behavioural — observe outputs, judge them, train. Interp is the bet that this is not enough at higher capability levels, because a sufficiently competent model can pass any behavioural test. If interp succeeds, the lab can look inside a model and verify that its computations are the ones we want. If interp fails — if the technique does not scale, or what it reveals does not generalise — the lab's framing of how to deploy frontier systems is in trouble. It is an unusually high-stakes scientific bet, and the lab has accordingly given Olah's team some of the strongest research culture in the field.

08

Computer Use, Claude Code, MCP — the Agent Era

Anthropic was first to ship several agent-era primitives in production. The agents-and-tools surface area is now the lab's most active product investment.

Computer Use (Oct 2024)

Claude takes screenshots, returns mouse-and-keyboard actions, and operates a desktop or browser. First production GUI-control model from a frontier lab. Beat OpenAI to market by three months.

Claude Code (Feb 2025)

CLI agent for software engineering. File-system tools, terminal access, an SDK and a hooks framework. Has been the fastest-growing developer tool in the LLM ecosystem in 2025.

Model Context Protocol (MCP, Nov 2024)

An open JSON-RPC protocol for connecting LLMs to tools and data sources. Designed to be vendor-neutral; OpenAI and Google have since added support. The most successful open standard the field has produced since the Hugging Face transformers library.

Why the agent push concentrated at Anthropic

09

The Funding History — Google, Amazon, Spark, others

Anthropic has raised the second-most capital of any AI start-up after OpenAI. Roughly:

DateRoundLead / notableAmount
May 2021Series AJaan Tallinn, Dustin Moskovitz, others$124 M
Apr 2022Series BSam Bankman-Fried (later returned), others$580 M
2023Series C / additionalSpark, Salesforce, Sound Ventures, others~$450 M
Sep 2023Amazon strategicAmazon up to $4 B; AWS preferred cloud$4 B (committed)
Oct 2023Google strategicGoogle up to $2 B; GCP available$2 B (committed)
2024–26Subsequent strategic roundsAmazon further $4 B in 2024; further rounds in 2025$15 B+ cumulative

The unusual thing about the cap table is the dual-cloud strategic arrangement: Amazon and Google are both major investors and both compete to host Anthropic models. Anthropic explicitly negotiated this dual structure. Avoiding single-strategic-investor capture is consistent with the LTBT logic.

A subtle but important point

Anthropic's fundraising posture has been to take strategic capital from cloud providers without granting any of them OpenAI-Microsoft-style preferred status. AWS hosts Claude through Bedrock; GCP hosts Claude through Vertex; Anthropic's own infrastructure runs on both. The lab's strategic independence has held to date, which is a non-trivial achievement given the size of the cheques involved.

10

Anthropic Today — Org, Culture, External Posture

Anthropic in 2026 is roughly 1,500 people, headquartered in SoMa San Francisco with offices in Seattle, London, Zürich and Tokyo (as of 2025). Revenue is publicly reported to be approaching $5 B ARR.

The org

  • Co-founders still day-to-day involved.
  • Research split into capability, alignment, interpretability, frontier red-team.
  • Product, sales, policy, security, infrastructure as adult engineering functions.
  • Frontier red-team reports outside the deployment chain.

The culture

  • Unusually concentrated on EA / rationalist-adjacent talent in early years; this has diversified.
  • Long, careful internal discussion before product changes.
  • Public character: thoughtful, slightly pessimistic, technically dense, often verbose.
  • External engagement skews toward policy, governance, mech interp.

The external posture

  • Does not chase consumer-chat market share aggressively.
  • Publishes alignment work and parts of the Constitution publicly.
  • Engages substantively with UK AISI, US AISI, and EU AI Office.
  • The CEO writes essays. They are read.
A pattern

The lab has deliberately built a brand around trust rather than most powerful. In 2023–24 this was sometimes a commercial constraint — users on the frontier-capability axis preferred GPT-4. By 2025–26 with Claude 4.5 / 4.7 the gap has closed and the trust positioning has become commercially valuable rather than constraining. The strategic logic is consistent throughout: be the lab serious people send their hardest decisions to.

11

The Argument — Why Build It At All?

The deepest critique of Anthropic, made most often by AI-risk-skeptical observers and by some safety-focused people outside the lab, is: if you really believe AGI is dangerous, don't accelerate the frontier. Anything you build, someone else will use as a benchmark to surpass.

Anthropic's standard response has been:

The "race to the top" thesis

  • The frontier is going to be built whether we participate or not.
  • If we participate, we can shape what frontier deployment looks like.
  • RSP, interp, Constitutional AI propagate to other labs by competitive pressure.
  • We can attract the talent and the contracts that would otherwise go to less-safety-conscious labs.

The honest critiques

  • "Our race-to-the-top is just a race." Building anything advances the field.
  • Commercial pressure compresses safety timelines.
  • The PBC + LTBT structure is untested under stress.
  • If the company's bet is wrong about how to align frontier systems, it has no fallback.
I am scared of the technology I work on. That fear is the central organising principle of how we build this company. — Dario Amodei has expressed this sentiment in multiple interviews and podcasts; the precise phrasing varies but the position has been consistent since the founding.
Where this lands

Whether the "race to the top" thesis works empirically is one of the field's important open questions. Five years in, on the evidence available, Anthropic has shipped meaningful safety research, has materially shaped the industry's posture (RSP-style policies are now common), and has not visibly cut corners on safety in deployment. Whether that pattern survives a frontier capability jump that genuinely matters is the test that has not yet happened.

12

Cheat Sheet

Five turning points

  • Feb 2021 — founded.
  • Dec 2022 — Constitutional AI paper.
  • Sep 2023 — Responsible Scaling Policy.
  • Oct 2024 — Computer Use ships.
  • 2025 — Claude 4 + extended thinking + Claude Code at scale.

The principals

  • Dario Amodei — CEO.
  • Daniela Amodei — President.
  • Tom Brown, Sam McCandlish, Jared Kaplan — senior research.
  • Chris Olah — mechanistic interp.
  • Jack Clarke — policy.
  • Jan Leike, John Schulman — senior alignment joiners 2024.

Three pillars

  • Constitutional AI (alignment recipe).
  • Responsible Scaling Policy (deployment commitments).
  • Mechanistic interpretability (the long bet).

What's next in the series

  • 07 — Google DeepMind: where the transformer came from but the products lagged.
  • 08 — Meta, Mistral, xAI, the rest of the frontier.