LLM History 04 — University Labs

00

What This Deck Covers

An anatomy of the academic side of the field. Twelve labs, each with a short technical contribution list and a list of the people who came out of it. The deck pays particular attention to the advisor tree — who trained whom — because in a field this small, the family tree is half the story.

Why University Labs Mattered (And Why They Matter Less Now)
Toronto — Hinton's Lab
Mila / Montreal — Bengio
Stanford NLP — Manning
Stanford ML — Ng, Liang, Ré, Koller
NYU — LeCun, Cho, Bottou
CMU — Mitchell, Salakhutdinov, Cohen
Berkeley — Russell, Abbeel, Song, Darrell
MIT, Princeton, Allen AI & Others
The Advisor Tree
Why Industry Took Over
Cheat Sheet

01

Why University Labs Mattered (And Why They Matter Less Now)

For most of the period covered by deck 02 — that is, 1948 to 2018 — the centre of gravity of NLP research was university labs and their close cousins (IBM Watson, Bell Labs, MSR, BBN). The transformer paper itself came out of an industry lab, but most of its authors were trained in academia, and most of the ideas it integrated came from academic work.

Three things were true that no longer are:

1. Frontier compute was tractable

State-of-the-art models in 2014 trained on a handful of GPUs. By 2017 you needed a small TPU pod. By 2024 you need a $100 M cluster. Academic groups simply cannot run frontier experiments anymore.

2. Knowledge was published

Pre-2022 the best research was published with code. Post-2022 frontier model details (data mixes, RL recipes, alignment infrastructure) are increasingly proprietary. Academic groups can read the public papers but cannot replicate frontier behaviour.

3. Talent stayed

A senior PhD student in 2015 expected to take a postdoc and then a faculty job. By 2024 the equivalent person mostly goes straight to a frontier lab on $1–3 M/year compensation. The faculty pipeline is collapsing.

What university labs still do well

Mechanistic interpretability, evals, scientific applications (biology, materials), small-model architecture exploration, theory, and training the next generation of researchers. The biggest scientific contributions of academic NLP in the LLM era have come from not trying to compete on frontier capabilities — the Princeton/Anthropic interp work, Stanford CRFM's evals (HELM), MIT's robotics-foundation-model work, Mila's safety-focused programme.

02

Toronto — Hinton's Lab

University of Toronto Department of Computer Science · Vector Institute (founded 2017) · Toronto, Canada

The lab that kept neural nets alive

Hinton joined Toronto in 1987 (after a stint at CMU) and stayed in Canada essentially because Canadian funding still backed neural networks when most of the field had moved on. The Canadian Institute for Advanced Research's CIFAR programme, run with Bengio and LeCun, kept a small but tightly connected community alive through the 1990s and 2000s.

GH

Geoffrey Hinton

Toronto 1987–present (emeritus); Google Brain 2013–2023

British, descended from George Boole and educated at Cambridge and Edinburgh. Hard-line on the centrality of representation learning. A famously socratic supervisor — he asks more questions than he gives answers and his lab is run consensually. Resigned Google in May 2023 specifically to be free to speak about AI risk; subsequently shared the 2024 Nobel Prize in Physics with John Hopfield.

Selected Hinton-trained alumni who went on to define the modern field

Ilya Sutskever — PhD with Hinton 2013. Co-author AlexNet. Co-founder OpenAI 2015. Founder Safe Superintelligence 2024.
Alex Krizhevsky — PhD student. AlexNet first author. Briefly at Google Brain.
Ruslan Salakhutdinov — PhD, then CMU faculty, then ran Apple's ML research (2016–2020).
Alex Graves — postdoc; LSTM/CTC; DeepMind.
Volodymyr Mnih — PhD; DQN, the Atari paper that founded modern deep RL; DeepMind.
Oriol Vinyals — collaborator; seq2seq; AlphaGo lead engineer; now VP at DeepMind.
James Martens — PhD; second-order optimisation, Hessian-free; DeepMind.
Nitish Srivastava — PhD; dropout; Apple.
Tijmen Tieleman — PhD; RMSProp; Apple.

The 2018 split

When Hinton co-founded the Vector Institute in 2017, the bargain was that frontier-leaning Toronto researchers would have a Canadian industry-aligned home rather than leaving for the US. It worked partially — Cohere's three founders are all Toronto-aligned, and Vector itself is well-funded. But Sutskever, Krizhevsky, Mnih, Graves and Vinyals had all already gone south by then.

03

Mila / Montreal — Bengio

Mila — Quebec AI Institute · affiliated with Université de Montréal & McGill · Montreal, Canada

The largest deep-learning academic group in the world

Mila started in 2017 as a formal expansion of Bengio's research group at Université de Montréal, with backing from the Quebec government. By 2026 it has roughly 1,000 affiliated researchers (faculty, students, postdocs). Bengio's own students dominate the senior ranks of the modern field as much as Hinton's do.

Selected Mila / Bengio alumni

Ian Goodfellow — PhD; GANs (2014); Google → Apple Director of ML → DeepMind.
Aaron Courville — postdoc, then Mila faculty; co-author of the Deep Learning textbook.
Hugo Larochelle — PhD; Google Brain Montreal lead; Cohere.
Kyunghyun Cho — postdoc; GRU; encoder-decoder; NYU faculty.
Dzmitry Bahdanau — PhD; attention paper; Mila faculty.
Arthur Mensch — PhD-era visitor; later DeepMind; co-founder of Mistral.
Junyoung Chung — PhD; speech-language models; DeepMind / Google.
Caglar Gulcehre — PhD; reinforcement learning; DeepMind.
Sebastian Ruder — postdoc-affiliate; Cohere.
David Krueger — PhD; AI safety; Cambridge faculty → UK AISI.

Bengio's late-career pivot

Bengio is unusual among the godfathers in having pivoted, in his sixties, away from capability research and almost entirely toward AI-safety research. His writing from 2023 onward is some of the more lucid public material on existential and catastrophic AI risk by a senior practitioner. Mila has accordingly oriented a meaningful fraction of its work toward safety and interpretability.

04

Stanford NLP — Manning

Stanford NLP Group · Stanford Computer Science · Stanford HAI · Stanford CRFM · California

The most influential NLP-focused lab

Run by Christopher Manning since the early 2000s, Stanford NLP has produced more first-rank NLP researchers than any other group. The lab's work is strikingly broad — parsing, semantics, IR, dialogue, evaluation, theory, embeddings — and its alumni populate every frontier lab.

CDM

Christopher D. Manning

Stanford NLP director, Stanford HAI co-director

Australian; PhD at Stanford under Joan Bresnan, faculty at CMU briefly, then Stanford from 1999. Famously calm, methodical, prolific. The author (with Schütze) of Foundations of Statistical Natural Language Processing, the textbook that taught a generation. His Stanford CS224N course is the most widely watched NLP course on the internet.

Selected Stanford NLP alumni

Andrej Karpathy — PhD with Fei-Fei Li (CV) but very close to Manning's group; OpenAI co-founder; Tesla; back at OpenAI; YouTube educator.
Richard Socher — PhD; recursive neural nets; Salesforce Chief Scientist; founder of you.com.
Jeffrey Pennington — postdoc; GloVe; Google Brain.
Danqi Chen — PhD; reading comprehension; Princeton faculty (see slide 09).
Kelvin Guu — PhD; retrieval-augmented LMs; Google Research.
Percy Liang — faculty colleague; sees slide 05.
Tatsunori Hashimoto — faculty; Stanford CRFM core.
Hugh Chen, Eric Mitchell, Sang Michael Xie, Robin Jia — recent PhDs spread across OpenAI, Anthropic, Google.

Stanford CRFM and HELM

The Center for Research on Foundation Models (CRFM, launched 2021) and its HELM evaluation framework are Stanford's flagship LLM-era institutions. CRFM is, in effect, an attempt to organise frontier-adjacent academic research at a scale that can stay relevant: shared compute, public benchmarks, and collaborations with frontier labs to evaluate models. It is the only academic effort that has had measurable influence on frontier-lab evaluation practice.

05

Stanford ML — Ng, Liang, Ré, Koller

The non-NLP side of Stanford's ML programme has its own trajectory. Andrew Ng's lab in the late 2000s and early 2010s trained much of the core early Google Brain cohort; Percy Liang's group took over much of the formal-LLM work in the 2020s; Christopher Ré's lab on data-centric AI runs in parallel.

AN

Andrew Ng

Stanford 2002–present; Google Brain co-founder; Coursera co-founder; Baidu Chief Scientist 2014–2017; Landing AI

British-born, Hong-Kong-raised, MIT and Berkeley educated. Ran Stanford's flagship CS229 Machine Learning for years, putting it on Coursera and effectively kick-starting modern online ML education. Co-founded Google Brain in 2011. Quietly one of the most consequential ML educators ever; his early students include Quoc Le and Adam Coates.

PL

Percy Liang

Stanford faculty; Stanford CRFM director

Did his PhD at Berkeley (Klein) and MIT (Tommi Jaakkola), joined Stanford in 2012. Crystal-clear writing style, runs the most empirically rigorous LLM-evaluation programme in academia. HELM, prompt engineering early work, and a lot of the post-2022 conceptual framing of LLM behaviour comes from his group.

CR

Christopher Ré

Stanford faculty; co-founder SambaNova, Together AI, Snorkel AI

The most successful entrepreneurial Stanford ML faculty member in this generation. Mamba and the entire state-space-model line came partly out of his lab (Albert Gu was his PhD student). Together AI is one of the leading inference-serving companies; SambaNova builds AI ASIC racks. Three concurrent serious companies is unusual for a sitting professor.

Selected alumni

Quoc Le — Ng PhD; Google Brain; AutoML; Bard / Gemini.
Pieter Abbeel — Ng PhD; Berkeley faculty (slide 08); Covariant; OpenAI.
Andrew Maas — Ng PhD; Roam; Tesla.
Adam Coates — Ng PhD; Baidu.
Albert Gu — Ré PhD; CMU faculty; Mamba; co-founder Cartesia.
Tri Dao — Ré PhD; Princeton faculty; FlashAttention; Together AI.
Sang Michael Xie, Tatsunori Hashimoto — Liang's recent group, now key alignment-evals figures.

06

NYU — LeCun, Cho, Bottou

NYU Courant Institute · NYU CILVR · NYU Center for Data Science · New York City

The east-coast neural-net stronghold

NYU is the nearest US-coast equivalent to Toronto. LeCun moved from Bell Labs to NYU in 2003 and built a small but tight neural-nets group. Kyunghyun Cho's arrival in 2015 added a second strong NLP voice; Léon Bottou's intermittent affiliations and Rob Fergus's CV work fill out the picture.

YL

Yann LeCun

NYU professor (joined 2003); Meta AI / FAIR Chief Scientist (since 2013)

French. PhD in Paris, postdoc with Hinton in Toronto, then Bell Labs (1989–1996) where he built LeNet and ran the convolutional-net programme. The most public of the godfathers — he is on Twitter daily, picks fights with Marcus and others on AGI roadmaps, and is the loudest senior advocate for open-weight frontier research. Joint Turing Award 2018. JEPA / world-model proposal is his current research bet against the pure-LLM roadmap.

KC

Kyunghyun Cho

NYU faculty (since 2015); Genentech research director (currently on leave)

Korean. Already covered as a Mila postdoc in deck 02; ran a strong NYU group focused on encoder-decoder generative models and applications to biology. Currently leading drug-discovery foundation-model work at Genentech.

Selected NYU alumni

Sumit Chopra — LeCun PhD; Facebook AI; NYU Med.
Mikael Henaff, Alfredo Canziani, Pablo Sprechmann — LeCun students.
He He — faculty; alignment, evals.
Sam Bowman — faculty; spent significant time at Anthropic on alignment evaluation; one of the most respected post-PhD voices on alignment.

The LeCun open-weight stance

LeCun has used his platform — FAIR Chief Scientist plus a million+ Twitter following — to push Meta toward an aggressive open-weights strategy. He often argues this publicly, in opposition to Hinton's safety-focused turn and Bengio's catastrophic-risk emphasis. The three godfathers, who shared a Turing Award, now publicly disagree on the most important questions facing the field. Deck 08 picks this up under Meta.

07

CMU — Mitchell, Salakhutdinov, Cohen

Carnegie Mellon University Machine Learning Department · Language Technologies Institute · Pittsburgh

The most academically structured ML programme in the US

CMU is unique in having a standalone Machine Learning Department (since 2006, founded by Tom Mitchell) and a separate Language Technologies Institute. Together they produce a steady stream of PhDs. The lab's signature is depth and breadth in equal measure — CMU graduates are typically strong on both theory and systems.

TM

Tom Mitchell

CMU since 1986; ML Department founder; Never-Ending Language Learning

The senior figure of CMU machine learning. Author of Machine Learning (the textbook). NELL was an early bet on continuous learning over years that anticipated some of the lifelong-learning ideas now revisited at frontier labs.

RS

Ruslan Salakhutdinov

CMU faculty; ran Apple ML research 2016–2020

Hinton PhD, MIT postdoc, then CMU faculty since 2011. Deep Boltzmann machines, neural Turing-machine variants, lifelong learning. His 2016–2020 stint at Apple was Apple's most credible attempt to build a frontier-research arm; it ended quietly when Apple's product strategy diverged from open research.

Selected CMU alumni

Maxine Eskenazi, Alan Black — long-time LTI faculty; speech.
Geoff Gordon — ML Department; reinforcement learning, no-regret algorithms.
Aarti Singh, Pradeep Ravikumar, Ameet Talwalkar — younger ML faculty.
Eric Xing — faculty; founded MBZUAI later.
William Cohen — faculty; LTI; later moved to Google.
Albert Gu — recent faculty; Mamba; Cartesia co-founder.

08

Berkeley — Russell, Abbeel, Song, Darrell

UC Berkeley EECS · Berkeley AI Research (BAIR) · Center for Human-Compatible AI (CHAI)

RL, robotics, security, and AI-safety theory

Berkeley's ML programme is less language-focused than Stanford's but produced an outsize share of the people who built modern reinforcement learning, robotics, and AI safety. Stuart Russell's CHAI is one of the two leading academic AI-safety centres alongside Mila.

SR

Stuart Russell

Berkeley professor; CHAI director

British. Co-author with Peter Norvig of Artificial Intelligence: A Modern Approach — the field's standard textbook for thirty years. Founded CHAI in 2016 specifically to work on AI alignment from a control-theoretic angle. Author of Human Compatible. Public AI-risk advocate from inside the academic mainstream.

PA

Pieter Abbeel

Berkeley faculty; co-founder Covariant; OpenAI alumnus

Belgian. PhD with Andrew Ng at Stanford, then Berkeley faculty. Trained much of the modern deep-RL talent pool: Sergey Levine (collaborator/colleague), Chelsea Finn, Sander Dieleman, Karol Gregor, John Schulman (joint advisor with Ng). Schulman went on to found OpenAI as one of the youngest co-founders. Abbeel ran Covariant (industrial robotics, acquired by Amazon in 2024).

DS

Dawn Song

Berkeley faculty; founder Oasis Labs

Chinese-American. PhD at Berkeley under David Wagner; faculty since 2007. Sits at the intersection of security and ML. Her group is one of the main academic sources of work on adversarial robustness, model stealing, and AI security.

Selected Berkeley alumni

John Schulman — Abbeel/Ng PhD; OpenAI co-founder; PPO; alignment lead 2022–2024; Anthropic since 2024.
Chelsea Finn — Abbeel/Levine PhD; Stanford faculty; Pi (now Physical Intelligence); MAML (meta-learning).
Sergey Levine — Berkeley faculty; one of the most prolific roboticists.
Trevor Darrell — faculty; vision, captioning.
Eric Wallace, Nick Carlini — PhDs; AI-security researchers, now at OpenAI / Google.
Karol Gregor, Igor Mordatch — alumni; DeepMind / Google.
Anca Dragan — faculty; alignment / robotics. Joined Google DeepMind 2023.

09

MIT, Princeton, Allen AI & Others

The remaining major academic centres for LLM-relevant research, more briefly.

MIT CSAIL & CBMM

MIT's contributions are scattered across several groups. Tommi Jaakkola for theory; Regina Barzilay for chemistry/biology applications; Antonio Torralba for vision; Pulkit Agrawal for robotics; Jacob Andreas for language and reasoning. The Center for Brains, Minds and Machines (CBMM) under Tomaso Poggio runs a separate cognitive-science-flavoured programme. MIT does not have a single dominant LLM lab the way Stanford or CMU does, but its alumni show up everywhere — Sam Altman dropped out of Stanford CS but several OpenAI co-founders went through MIT.

Princeton

Smaller programme but highly concentrated. Sanjeev Arora on theory of deep learning; Arvind Narayanan on AI policy and de-hyped evaluation; Danqi Chen (Manning PhD) on retrieval and pretraining; Tri Dao (Ré PhD) on FlashAttention and architectures; Karthik Narasimhan on agents and RL. Punches above its weight given its size.

Allen Institute for AI (AI2)

Founded 2014 by Paul Allen in Seattle. Run for years by Oren Etzioni, then Ali Farhadi from 2023. Sits between an academic lab and an industry research lab. ELMo (Peters et al, 2018) was theirs; OLMo, the most respected fully-open-weights research model in 2024–2025, is theirs. AI2 explicitly preserves the publish-everything academic ethos that disappeared from frontier labs after 2022.

Other notable groups

EPFL (Switzerland, Martin Jaggi's group), ETH Zürich (Thomas Hofmann), Edinburgh (Mirella Lapata, Ivan Titov), Cambridge (David Krueger before AISI, Carl Henrik Ek), Oxford (Yarin Gal, Phil Torr), UCL (Sebastian Riedel before Meta, Ed Grefenstette), Tsinghua & Peking (the senior Chinese ML faculty — Andrew Yao, Maosong Sun, Xipeng Qiu — whose students populate DeepSeek, Qwen, Moonshot, Zhipu). Deck 09 picks up the China side.

10

The Advisor Tree

One way to read modern LLM research is as a small, dense advisor graph. This slide draws the rough shape, anchored on Hinton, Bengio and LeCun.

Hinton tree (selected)

Hinton → Sutskever (OpenAI / SSI), Krizhevsky, Salakhutdinov (CMU), Graves (DeepMind), Mnih (DeepMind), Vinyals (DeepMind), Hassabis (collaborator), Goodfellow (postdoc, also Bengio).

Through students: Sutskever → OpenAI co-founders; Salakhutdinov → CMU lineage of dozens of PhDs; Graves → LSTM-era DeepMind cohort.

Bengio tree (selected)

Bengio → Goodfellow, Bahdanau, Cho (postdoc), Larochelle (Cohere), Courville (Mila), Vincent (Mila), Mensch (Mistral co-founder).

Through students: Goodfellow → Apple/DeepMind; Cho → NYU lineage; Mensch → Mistral and the European frontier-lab line.

LeCun tree (selected)

LeCun → Sumit Chopra, Marc'Aurelio Ranzato, Camille Couprie, several Bell Labs collaborators, Kavukcuoglu (postdoc → DeepMind CTO).

FAIR alumni: Mike Lewis (Anthropic), Luke Zettlemoyer (UW/FAIR), Naman Goyal (Mistral), Marie-Anne Lachaux (Mistral), Tim Lacroix (Mistral co-founder).

A useful pattern

Look at any frontier lab's senior research staff and you can almost always trace each of them, in two or fewer hops, to one of the three godfathers. Hinton's descendants cluster at OpenAI, Google DeepMind and CMU. Bengio's cluster at Mila, Mistral, Cohere, and Anthropic's safety-leaning alignment team. LeCun's cluster at Meta AI, Mistral (notable: both Bengio and LeCun feed Mistral) and a scattering of CV-flavoured labs.

11

Why Industry Took Over

The university labs are still excellent. The reasons they are no longer the centre of frontier work are structural and unlikely to reverse.

The four structural facts

Compute. A single 100,000-H100 cluster costs ~$5 B all-in. Universities cannot build them; the largest US academic cluster has perhaps 2,000 H100-equivalents.
Data. Frontier-quality pretraining data is curated, RL-evaluated, and increasingly proprietary. Academic labs cannot replicate it.
Compensation. Senior frontier-lab compensation is $1–5 M/year. Faculty positions pay $200–500 K. The gap is too large to bridge with non-financial perks.
Speed. The frontier moves on a 6-month cadence. Academic publication cycles are 12–24 months.

What that leaves academia

Theory. Where loss landscapes come from, why scaling laws hold, what generalisation means.
Interpretability and safety. Long-horizon, conceptually deep, doesn't need frontier compute.
Evaluation. Frontier labs do not impartially evaluate themselves; HELM, Inspect AI, METR all benefit from academic involvement.
Education. The next generation of frontier-lab hires.
Application domains. Biology (Genentech, Recursion), materials, climate, healthcare — where the data and the experts are still in academia.

The CRFM model

Stanford CRFM is the closest existing example of an academic group that has stayed strategically relevant in the LLM era. It does so by (a) not competing on frontier capabilities, (b) building public benchmarks the industry actually uses, and (c) collaborating directly with frontier labs. It is the template most other academic centres are quietly trying to copy.

12

Cheat Sheet

Twelve labs to know

Toronto / Vector — Hinton lineage.
Mila / Montreal — Bengio.
Stanford NLP — Manning.
Stanford ML / CRFM — Liang, Ré.
NYU — LeCun, Cho.
CMU MLD / LTI — Salakhutdinov, Mitchell.
Berkeley BAIR / CHAI — Russell, Abbeel, Song.
MIT CSAIL / CBMM.
Princeton — Arora, Narayanan, Chen, Dao.
Allen AI — Etzioni / Farhadi.
Tsinghua / Peking U — the senior Chinese pipeline.
EPFL / ETH / Cambridge / Oxford / Edinburgh.

Three lineages

Hinton → OpenAI / DeepMind / CMU.
Bengio → Mila / Mistral / Cohere.
LeCun → Meta AI / FAIR / Mistral.

Why industry took over

Compute.
Data.
Compensation.
Speed.

Where academia still leads

Interpretability & safety.
Public evaluation (HELM, OLMo).
Theory.
Domain applications (bio, materials).
Training the next generation.