Founded in 2021 by people who left OpenAI because they thought safety should be the centre of the work, not an adjacent function. Five years on, it is one of three western frontier labs and runs the most-trusted assistant in the developer market. The story of how that combination came to be.
Anthropic as an organisation: the founding bargain, the people, the bets, the structure, and the ongoing argument about whether the bargain holds. Like the OpenAI deck, the technical content of Claude itself is in the architecture and safety sub-hubs of the LLMs site; this is the lab as an institution.
Anthropic was founded in early 2021 (incorporated in February, announced publicly later that year) by a group of senior researchers who had left OpenAI within a few months of each other in late 2020 and early 2021. The lab opened in San Francisco's SoMa district with around seven people; it announced a $124 M Series A in May 2021.
The founders' public account of the departure has been consistent: a disagreement about how directly safety research should drive frontier deployment decisions. Not a hostile split — the Amodeis have spoken positively of OpenAI in subsequent interviews — but a genuine difference in emphasis large enough to justify a new institution.
Several of the founders had been the most senior alignment-focused researchers at OpenAI. Dario Amodei was VP of Research; his sister Daniela was head of operations; Tom Brown had led the GPT-3 effort; Sam McCandlish had co-authored the Scaling Laws paper; Jared Kaplan had been the lead author on it; Chris Olah had been doing distill.pub-style mechanistic interpretability work that was pulling in a different direction from the rest of OpenAI's research portfolio.
OpenAI had multiple research lines, several of which were the alignment line. Anthropic was an attempt to start a frontier lab where the alignment line was the through-line of the whole organisation rather than one of several. The argument is whether this is actually possible — whether you can stay at the frontier and let safety constrain decisions in ways the rest of the field will not. The deck's thesis is that, on the evidence so far, the answer is conditional yes.
Italian-American. Trained as a computational biophysicist (his PhD work on retinal neural networks shows up in his AI thinking even today). Joined OpenAI in 2016, ran much of the GPT effort and the safety research line. Public face of the company. Famously thoughtful in interviews; his Machines of Loving Grace essay (2024) is one of the more carefully reasoned visions of beneficial AI from a frontier-lab CEO. Personally serious about catastrophic risk and equally serious about building the technology anyway, on the argument that someone will and it had better be people who care.
Worked in political organising and at Stripe before OpenAI; her operational background is unusually diverse for a frontier-lab co-founder. Runs the company day-to-day — org structure, policy, public-engagement. The pairing with Dario is genuinely complementary: she is the operational and external lead, he is the research and strategic lead.
OpenAI lead author on GPT-3. The senior pretraining engineer at Anthropic.
Theoretical physicist (Stanford). Co-author Scaling Laws. Senior research lead.
Theoretical physicist, Johns Hopkins faculty (now on leave). Lead author on Scaling Laws. Anthropic Chief Scientist for several years.
Former AI policy lead at OpenAI; UK government adviser. Heads Policy at Anthropic. Author of the long-running Import AI newsletter.
Self-taught researcher; ran the original distill.pub. Sets the direction of the mechanistic-interpretability programme. One of the most distinctive scientific voices in the lab.
Senior research / engineering co-founders (the founder list of "around eight" varies depending on whether you count the very-earliest hires). The lab's culture is unusually flat for a research org of its size.
Jan Leike joined from OpenAI in 2024 after the November 2023 board crisis. John Schulman followed in late 2024. Mike Lewis from FAIR. Daniel Kokotajlo from OpenAI's governance team. The "Anthropic is the place ex-OpenAI alignment people go" pattern that was implicit in the founding has become explicit by 2025.
Anthropic incorporated in 2021 as a Public Benefit Corporation (PBC) in Delaware. PBCs are for-profit but are required to balance shareholder returns against a stated public benefit, in Anthropic's case "the responsible development and maintenance of advanced AI for the long-term benefit of humanity". PBC directors have legal cover to consider mission alongside profit.
In September 2023 the lab introduced a second governance instrument, the Long-Term Benefit Trust:
The PBC + LTBT structure is an attempt to do better than OpenAI's hybrid form, but it is untested under stress. It has not yet been tested in a situation where the company's commercial interests and the LTBT's mission view diverged sharply. Anthropic itself describes the structure as experimental, and the trustees have publicly said the same. The actual durability of governance constraints is something we will only know if and when stress applies.
The first major technical paper out of Anthropic was Constitutional AI: Harmlessness from AI Feedback (Bai et al, December 2022). It sets the alignment-research orientation of the lab.
RLHF as practised at OpenAI in 2022 trained a reward model from human comparisons of outputs. Constitutional AI replaces (or supplements) the human comparisons with AI-generated comparisons, where another model evaluates outputs against an explicit set of principles — the "constitution".
The constitution is a list of plain-English principles ("don't help with illegal activities", "explain your reasoning", "avoid stereotypes", etc.). The model critiques and revises its own outputs against these principles, and the revised outputs become training signal. Far more scalable than human labelling, and more transparent than an opaque reward model.
Successive Claude versions have refined and expanded the Constitution. The 2024 release of Claude's Constitution as a public artefact is one of the few explicit alignment-recipe disclosures from any frontier lab. The Constitution is also the technical mechanism behind Claude's character — the warmer, more deliberate prose style that Claude users tend to recognise as distinct from GPT-4o or Gemini. It is intentional.
Anthropic's flagship product line is the Claude family of assistants. Initial release in early 2023, at which point ChatGPT was three months old.
| Date | Model | What it added |
|---|---|---|
| Mar 2023 | Claude 1 | Constitutional AI in production. Strong refusal handling. 100k context window from Claude 1.3. |
| Jul 2023 | Claude 2 | 200k context. Stronger reasoning. First broad enterprise contracts. |
| Mar 2024 | Claude 3 (Haiku, Sonnet, Opus) | Three-tier line. Opus competitive with GPT-4. Vision capability native. |
| Jun 2024 | Claude 3.5 Sonnet | Step jump on coding benchmarks (HumanEval, SWE-bench). Becomes the default coding model for many teams. |
| Oct 2024 | Claude 3.5 Sonnet (new) + Computer Use | First production GUI-controlling agent from a frontier lab. |
| 2025 | Claude 3.7, Claude 4 (Opus / Sonnet / Haiku), extended thinking | Test-time-compute integration. Claude becomes one of two top-tier coding models with GPT-5. |
| Late 2025 → 2026 | Claude 4.5, 4.6, 4.7 with 1 M-token context | Long-context generally available; agent-style use becomes default for developer audiences. |
Claude has been positioned more strongly toward developer and enterprise users than toward consumer chat. ChatGPT outweighs Claude.ai for consumer subscribers; the API and Claude Code dominate developer tooling and a meaningful share of enterprise contracts. The two labs' product strategies have diverged on this axis since 2023.
Claude Code — the CLI product launched in early 2025 — is one of the more strategically significant Anthropic moves. It positions Claude as a first-class agent for software-engineering work, with terminal access, file-system tools, and a dedicated SDK. By late 2025 it had become a meaningful enterprise revenue line in its own right, and the focal point of much of Anthropic's product investment.
In September 2023 Anthropic published the first version of its Responsible Scaling Policy, a public framework defining capability thresholds at which the lab commits to add specific safety measures or pause development. It was the first such commitment from a frontier lab. Major revisions followed in October 2024 and 2025.
Models that show early signs of dangerous capabilities — can be adversarially probed, can produce biology/cyber/chemistry information that is concerning, but no measurable uplift over a determined search-engine user. Standard responsible disclosure, model cards, deployment monitoring.
Materially uplift a malicious actor on a CBRN threat or autonomous-replication threat. Triggers stronger evaluations, deployment safeguards (jailbreak resistance benchmarks), red-team requirements, and limits on weights distribution.
Models capable of substantially autonomously executing tasks that would otherwise require nation-state-level resources. Requires hardened security (intent: defend against well-resourced adversaries including state actors trying to exfiltrate weights), independent red-team approval to deploy, and multi-stakeholder oversight.
Speculative; reserved for models substantially exceeding human-expert capability across most domains. Trigger thresholds and required mitigations are being researched rather than committed to.
The RSP is one of the few publicly verifiable commitments any frontier lab has made about its own behaviour. The capabilities thresholds are specific; the required mitigations are specific; the lab has bound itself in writing to actions if those thresholds are reached. Critics have noted that thresholds are still set by Anthropic itself, that "uplift" is hard to operationalise, and that "pause" has never been tested. Defenders note that no other frontier lab has even gone this far publicly.
Mechanistic interpretability is the attempt to reverse-engineer trained neural networks at the level of individual circuits and features — to understand what computation is happening, not just whether a model gives the right answer. Chris Olah has been the central figure in this since the late 2010s, originally at OpenAI, then at Anthropic from the founding.
| Year | Result | What it showed |
|---|---|---|
| 2021 | A Mathematical Framework for Transformer Circuits | Defines the basic building blocks for analysing attention-based circuits. |
| 2022 | In-Context Learning and Induction Heads | Identifies a specific circuit (induction heads) that mechanistically implements in-context learning. |
| 2023 | Toy Models of Superposition | Explains why circuits are hard to identify: features are stored in superposition. |
| 2024 | Sparse autoencoders / dictionary learning | Practical recipe for extracting interpretable features from production models. Scaling Monosemanticity applies this to Claude 3 Sonnet. |
| 2025 | Circuit-level intervention experiments | Reading and editing model behaviour through interpretable features. The first plausibly safety-relevant interpretability tooling. |
Most alignment work is behavioural — observe outputs, judge them, train. Interp is the bet that this is not enough at higher capability levels, because a sufficiently competent model can pass any behavioural test. If interp succeeds, the lab can look inside a model and verify that its computations are the ones we want. If interp fails — if the technique does not scale, or what it reveals does not generalise — the lab's framing of how to deploy frontier systems is in trouble. It is an unusually high-stakes scientific bet, and the lab has accordingly given Olah's team some of the strongest research culture in the field.
Anthropic was first to ship several agent-era primitives in production. The agents-and-tools surface area is now the lab's most active product investment.
Claude takes screenshots, returns mouse-and-keyboard actions, and operates a desktop or browser. First production GUI-control model from a frontier lab. Beat OpenAI to market by three months.
CLI agent for software engineering. File-system tools, terminal access, an SDK and a hooks framework. Has been the fastest-growing developer tool in the LLM ecosystem in 2025.
An open JSON-RPC protocol for connecting LLMs to tools and data sources. Designed to be vendor-neutral; OpenAI and Google have since added support. The most successful open standard the field has produced since the Hugging Face transformers library.
Anthropic has raised the second-most capital of any AI start-up after OpenAI. Roughly:
| Date | Round | Lead / notable | Amount |
|---|---|---|---|
| May 2021 | Series A | Jaan Tallinn, Dustin Moskovitz, others | $124 M |
| Apr 2022 | Series B | Sam Bankman-Fried (later returned), others | $580 M |
| 2023 | Series C / additional | Spark, Salesforce, Sound Ventures, others | ~$450 M |
| Sep 2023 | Amazon strategic | Amazon up to $4 B; AWS preferred cloud | $4 B (committed) |
| Oct 2023 | Google strategic | Google up to $2 B; GCP available | $2 B (committed) |
| 2024–26 | Subsequent strategic rounds | Amazon further $4 B in 2024; further rounds in 2025 | $15 B+ cumulative |
The unusual thing about the cap table is the dual-cloud strategic arrangement: Amazon and Google are both major investors and both compete to host Anthropic models. Anthropic explicitly negotiated this dual structure. Avoiding single-strategic-investor capture is consistent with the LTBT logic.
Anthropic's fundraising posture has been to take strategic capital from cloud providers without granting any of them OpenAI-Microsoft-style preferred status. AWS hosts Claude through Bedrock; GCP hosts Claude through Vertex; Anthropic's own infrastructure runs on both. The lab's strategic independence has held to date, which is a non-trivial achievement given the size of the cheques involved.
Anthropic in 2026 is roughly 1,500 people, headquartered in SoMa San Francisco with offices in Seattle, London, Zürich and Tokyo (as of 2025). Revenue is publicly reported to be approaching $5 B ARR.
The lab has deliberately built a brand around trust rather than most powerful. In 2023–24 this was sometimes a commercial constraint — users on the frontier-capability axis preferred GPT-4. By 2025–26 with Claude 4.5 / 4.7 the gap has closed and the trust positioning has become commercially valuable rather than constraining. The strategic logic is consistent throughout: be the lab serious people send their hardest decisions to.
The deepest critique of Anthropic, made most often by AI-risk-skeptical observers and by some safety-focused people outside the lab, is: if you really believe AGI is dangerous, don't accelerate the frontier. Anything you build, someone else will use as a benchmark to surpass.
Anthropic's standard response has been:
Whether the "race to the top" thesis works empirically is one of the field's important open questions. Five years in, on the evidence available, Anthropic has shipped meaningful safety research, has materially shaped the industry's posture (RSP-style policies are now common), and has not visibly cut corners on safety in deployment. Whether that pattern survives a frontier capability jump that genuinely matters is the test that has not yet happened.