ARM NEOVERSE · PRESENTATION 01

History & Product Lines

From "Cortex-A for servers" to a distinct infrastructure brand
Cosmos · Ares · Zeus · Poseidon · N1 · V1 · E1 · N2 · V2 · N3 · V3 · CSS
02

Prehistory — Arm in the Server Room (2012-2018)

  • Calxeda EnergyCore (2011-13) — first serious Arm server SoC (Cortex-A9, fabric-attached). Too early; Calxeda shut down in 2013.
  • AMD Opteron A1100 "Seattle" (2016) — 8 × Cortex-A57. AMD's first and last attempt at Arm server; cancelled after Zen prioritisation.
  • Qualcomm Centriq 2400 (2017) — 48 × custom Falkor cores. Technically excellent, strategically abandoned in 2018.
  • Marvell ThunderX / ThunderX2 (Cavium) — 32-48 × custom cores, shipped in Cray XC50 + Astra supercomputer.
  • Ampere eMAG (2018) — 32 × custom X-Gene, the first "real" commodity Arm server silicon.
  • Every one of these built their own core. No "Neoverse" yet: Arm provided licences but no server-optimised IP.
"The question in 2017 wasn't whether Arm could do a server. It was whether anyone other than Arm's custom licensees could build one." — infrastructure-industry view, paraphrased

Arm's answer: launch Neoverse as a dedicated infrastructure roadmap in October 2018, re-characterising Cortex-A76 for servers and committing to an annual cadence.

03

The 2018 Neoverse Announcement

  • October 16, 2018. Allen Technology Conference, Santa Clara.
  • Arm unveiled three roadmaps under one brand:
    • Cosmos — available now (2018). = Cortex-A72 / A75 repackaged for server.
    • Ares — 2019. 7 nm, first "Neoverse-native" — became N1.
    • Zeus — 2020. First to carry a non-mobile, server-only design. Became V1.
    • Poseidon — 2021 roadmap slot, v9-A. Became N2 / V2.
  • Promise: 30% performance/year improvements — sustained for the first ~5 years (N1→V1→V2 ≈ 50+% per gen on SPEC rates).
  • Three target profiles announced in 2020:
    • N — "balanced" scale-out server cores
    • V — "performance" wide cores for HPC
    • E — "efficiency" edge/networking cores

Why three profiles?

Mirrors how Arm served mobile with Cortex-A/R/M. A cloud platform wants dozens of N-cores per socket for scale-out; an HPC system wants fewer V-cores with wider SIMD; a 5G baseband or DPU wants E-cores at lowest power.

Annual cadence

Delivered every year since 2019 — N1 (2019), V1/E1 (2020-21), N2 (2021), V2 (2022), N3/V3 (2024). Faster than x86, driven by shared microarchitecture with Cortex-A flagships.

04

Neoverse N1 (Ares, 2019)

  • Under the hood: a Cortex-A76 core, re-validated for server use.
  • 4-wide decode, 8-wide issue, 128-bit NEON. No SVE.
  • Private L2 of up to 1 MB, LLC (via CMN-600 mesh) 1-2 MB/core.
  • Server-specific:
    • RAS: ECC on L1/L2/L3, poison handling
    • Single-bit error correction without traps
    • Larger TLBs, server-tuned prefetchers
  • First major deployment: AWS Graviton 2 (November 2019) — 64 cores, 7 nm TSMC. Became the basis of ~50 % of new EC2 launches within 3 years.
  • Also shipped in: Ampere Altra (80 → 128 cores), Oracle OCI A1 Flex, Alibaba Yitian 710 (2020).
Neoverse N1 (2019) Architecture ...... Armv8.2-A Derived from ...... Cortex-A76 Decode / Issue .... 4 / 8 wide SIMD .............. NEON 128-bit L1-I / L1-D ....... 64 KB / 64 KB L2 (private) ...... up to 1 MB Process ........... 7 nm TSMC First silicon ..... Graviton 2 (11/2019)
05

Neoverse V1 (Zeus, 2020)

  • Derived from Cortex-X1 — Arm's first big-and-wide core. Added SVE for HPC.
  • 2 × 256-bit SVE FP pipelines — widest SVE Arm has shipped (only A64FX's 512-bit SVE1 is wider).
  • 5-wide decode, larger ROB, deeper OoO, more aggressive prefetch.
  • Added SMT2 — first Neoverse with simultaneous multi-threading; helps on memory-stall workloads at cost of branch-heavy ones.
  • Private L2 up to 2 MB.
  • First silicon: AWS Graviton 3 (2022) — 64 cores, 7 cores per chiplet × 7 + IO die.
  • Also: SiPearl Rhea1 — V1-based European HPC chip for the Jupiter exascale machine (2025).

V1 is the HPC play

HPC and analytics codes vectorise well; Neoverse V1's 2 × 256-bit SVE delivers ~2.5 × the per-core FP throughput of N1 on BLAS-like kernels, with comparable area to a desktop X1.

SMT2 on servers

x86 has run 2-way SMT since Pentium 4. Neoverse V1 brings the same trick to Arm — DB-heavy and in-memory KV workloads see 15-30% throughput lift.

06

Neoverse E1 (2020) — the Edge Sibling

  • E = Efficiency. Derived from Cortex-A65AE, an in-order dual-issue core.
  • Designed for 5G base stations, NICs, smart switches, SD-WAN appliances — lots of packet-processing threads at low power.
  • Supports SMT2 natively — unusual for an in-order core, but hides network-packet latency.
  • Per-core: ~10× smaller than an N1, ~1/3 the power.
  • Shipping in: Marvell Octeon 10 DPU, ASR Microelectronics baseband, various OEM data-processing units (DPUs).
  • Successor E2 (2024) roadmap slot announced but fewer public shipments — edge silicon goes quietly.

Why the E tier exists

A DPU terminating millions of packets per second doesn't want huge OoO machinery; it wants many in-order SMT threads that can keep a 400 GbE pipe full. N/V-class cores would waste area and power on branch predictors the workload can't use.

Adjacent Arm IP

Ethos-N NPUs and Mali GPUs often sit on the same chip — Arm's "infrastructure CSS" reference designs bundle Neoverse-E with Ethos for combined edge-ML + packet boxes.

07

Neoverse N2 (Perseus, 2021 — Armv9-A)

  • Derived from Cortex-A710. First Armv9-A Neoverse. Mandates SVE2.
  • 5-wide decode, 8 × 128-bit SVE2 pipes per core.
  • Significant server-specific refinements:
    • Larger BTB + refined TAGE predictor
    • Private L2 up to 1 MB
    • Support for MPAM v1 for QoS of last-level cache + memory bandwidth
    • CHI-B interconnect upgrade
  • Confidential Compute (RME / CCA) — first Neoverse with hardware Realms.
  • Production silicon:
    • Microsoft Cobalt 100 — 128 cores, Azure (2024)
    • Alibaba Yitian 710 — 128 cores (shipped 2021 — surprise N2 early-bird)
    • NVIDIA BlueField-4 DPU (2024)

N = scale out

N2's sweet spot is 64-128 cores per socket at ~2.5-3.5 GHz, DDR5 + CXL 2.0 memory. Beats x86 Zen 4 on cloud-native (nginx, Redis, Java) by 25-45% perf/watt.

V2 (Demeter, 2022)

Paired with N2: derived from Cortex-X3. 4 × 128-bit SVE2 (narrower than V1's 2×256 but same FLOPs). Shipped in NVIDIA Grace CPU (72 cores per chiplet, 2 chiplets = 144, 2024).

08

Neoverse N3 & V3 (2024) — the CSS Generation

  • Announced February 2024 at Arm's Neoverse Tech Day.
  • N3 — successor to N2. Key changes:
    • Improved core IPC (~20% over N2 on SPEC)
    • Better bf16 / INT8 matmul throughput for AI inference on CPU
    • Tightened DVFS curves for lower idle power
  • V3 — successor to V2. Biggest generation-on-generation jump (~35% SPECint gain over V2):
    • Wider front-end, ~384-entry ROB
    • 4 × 128-bit SVE2 with higher utilisation
    • New data-dependent prefetcher
    • CCA acceleration
  • Delivered as part of Arm Compute Subsystems (CSS) — pre-validated core + CMN + GIC + SMMU drops. Used by Microsoft Cobalt 200 and AWS Graviton 5 (reportedly).
CSS = Compute Sub-System Neoverse cores + DSU-equivalent CMN mesh + SLC slices GIC-700 + SMMU-700 + MPAM CoreSight + CHI egress

CSS is the "server-on-a-chip starter kit" — customers integrate chiplets on top, saving 1-2 years of IP integration.

09

Comparing the Families

GenerationCodenameBased onSVEYearCanonical silicon
N1AresCortex-A762019Graviton 2, Ampere Altra, Yitian 710 (early)
E1HeliosCortex-A65AE2020Marvell Octeon 10, 5G basebands
V1ZeusCortex-X12 × 256-bit2021Graviton 3 / 3E, SiPearl Rhea1
N2PerseusCortex-A710 (v9-A)4 × 128-bit SVE22021Graviton 4 (partial), Cobalt 100, Yitian 710
V2DemeterCortex-X3 (v9-A)4 × 128-bit SVE22022NVIDIA Grace (72/144), HPE, planned HPC
N3A720-class (v9.2-A)4 × 128-bit SVE22024Azure Cobalt 200 (reported)
V3X4-class (v9.2-A)4 × 128-bit SVE22024Graviton 5 (reported), NVIDIA Grace next-gen

Not shown: custom-core Neoverse "cousins" like Ampere AmpereOne (A192 — 192 custom cores, Armv8.6-A) and Microsoft Cobalt 100 (N2 integration). The V-series trades core count for width; the N-series goes for socket density.

10

The N vs V Tradeoff

Throughput vs single-thread Single-thread IPC → Cores/socket → E1 N1 N2 N3 V1 V2 V3
  • E-series — low IPC, high thread count, very low power. DPUs, 5G, edge.
  • N-series — middling IPC, very high core counts per socket (up to 192). Cloud scale-out.
  • V-series — wide OoO + wide SIMD for single-thread + HPC; fewer cores per socket (48-96), SMT2.
  • Between customers: AWS uses N for cloud, V for HPC. Microsoft uses N for Azure. NVIDIA uses V for Grace (HPC + AI).
11

Timeline — 2012 to 2024

2012
Calxeda EnergyCore (Cortex-A9): first serious Arm server — too early
2016
AMD Opteron A1100 (A57) ships then dies; Arm-Cavium ThunderX1 finds scientific use
2017
Qualcomm Centriq 2400 (Falkor custom); cancelled 2018
2018
Neoverse brand launched — Cosmos / Ares / Zeus / Poseidon roadmap
2019
Neoverse N1 (Ares) + AWS Graviton 2 — the turning point
2020
Ampere Altra (80-cores N1); Neoverse E1 for edge
2021
Neoverse V1 (Zeus, SVE 2×256b) + first-mover SiPearl Rhea; Alibaba Yitian 710 ships N2 early
2022
Graviton 3 (V1 + DDR5 + PCIe Gen 5); Neoverse N2 (Perseus) generally available
2022
Neoverse V2 (Demeter) + NVIDIA Grace unveiled
2023
AmpereOne (custom-core "Siryn") ships; Arm IPO on NASDAQ
2024
Neoverse N3 / V3; Azure Cobalt 100 (N2) + AWS Graviton 4; Arm Total Design + CSS IP delivery
12

Why Neoverse Succeeded Where Earlier Attempts Failed

  • Shared microarchitecture with Cortex-A flagships — amortises R&D. Each Cortex-X/-A generation becomes a Neoverse within ~1 year.
  • Perf/Watt at hyperscale — AWS, Microsoft, Google all face power-constrained datacentres. Neoverse N1/N2 at 2.5-3 W/core vs Xeon's 5-8 W/core is decisive.
  • Software ecosystem caught up — Linux, JVM, Go, .NET, PyTorch, Postgres, Redis, Kafka all have first-class AArch64 builds since 2020.
  • SystemReady — a Linux distro boots on any compliant server without vendor-specific patches. Finally matched x86's UEFI/ACPI experience.
  • Hyperscaler self-sufficiency — AWS, Microsoft, Google now design their own Arm silicon, cutting out the middleman. Neoverse IP licences + Arm's CSS makes this practical.

Graviton as Arm's proof-point

AWS reports Graviton now accounts for >50% of new EC2 capacity. Customers see 20-40% price/perf improvement over Intel/AMD. Hard to argue with the datacentre P&L.

Open-source alignment

Arm backed Linaro + TF-A + EDK2 + Tianocore to make sure the full server stack was permissively licensed. Removed a big barrier for hyperscalers.

13

Neoverse CSS — the 2024 Shift

  • Compute Sub-System (CSS) — a pre-integrated block of cores + DSU-like cluster + CMN slice + system IP (GIC, SMMU, CoreSight).
  • Customers get a validated RTL drop ready to tile into a chiplet or SoC. Saves 12-24 months.
  • First CSS: CSS N2 (64 cores in a mesh), delivered late 2023. Azure Cobalt 100 built on it.
  • Follow-on: CSS V3 / N3 (2024). Some variants offer optional bundled HBM3 / LPDDR5X memory controllers.
  • Part of "Arm Total Design" — 30+ partner ecosystem (TSMC, Samsung Foundry, Cadence, Synopsys, Siemens EDA) aligned to help customers build chiplets.

Why CSS exists

Integrating a full Neoverse mesh is hard — physical design, RTL signoff, ISO 26262/RAS. Only hyperscalers with 100+ engineer silicon teams could afford it. CSS democratises access to Neoverse for companies with much smaller teams.

The chiplet angle

CSS maps cleanly onto a single chiplet, making Neoverse the "compute" side of UCIe-based multi-chiplet SoCs. Memory / IO / AI accelerators live on complementary chiplets.

14

Relationship to Cortex-A

  • Every shipping Neoverse is based on a Cortex-A/X flagship. But they are not identical — server re-validation adds:
    • RAS features (ECC, poison, lockstep support)
    • Larger TLBs
    • Mesh-style CHI egress (AMBA 5)
    • MPAM v1 QoS
    • Server-tuned prefetchers & branch predictors
    • Tuning for 2-3 GHz sustained vs 3.5+ GHz burst
  • The packaging is very different: Cortex-A ships as an IP drop to phone OEMs with DSU; Neoverse ships to silicon OEMs with CMN integration.

Fork timing

Typical pattern: Cortex-X/-A unveiled at Computex/May. Matching Neoverse unveiled at Arm Neoverse Tech Day (Oct-Feb) 6-12 months later. That's the validation + RAS window.

Benchmark flavours

Phone cores aim for Geekbench ST. Neoverse aims for SPECrate, STREAM, DGEMM, DB/nginx/Kafka rps. Different prefetcher tuning per target.

15

Lessons

  • "What's the difference between Neoverse N and V?" → N is scale-out (many cores, modest IPC). V is performance (wide OoO, wide SVE, SMT2). E is edge (low-power, in-order SMT).
  • "What was N1 derived from?" → Cortex-A76, re-validated with server-grade RAS, larger TLBs, CMN-600 egress.
  • "Why did Graviton win in AWS?" → ~40% better perf/$ and perf/W than contemporary x86 Xeon/Epyc on cloud-native workloads. Plus AWS owns the design, cutting Intel/AMD margin.
  • "What is CSS?" → Compute Sub-System: pre-integrated Neoverse + CMN + GIC + SMMU + CoreSight + DDR/CXL PHY. Makes chiplet integration a ~6-month task instead of ~2 years.
  • "What SVE width does V1 ship?" → 2 × 256-bit per pipe, highest Neoverse SVE width. V2 and V3 went back to 4 × 128 for a different throughput shape.
  • "Why SMT2 on V-series but not N?" → HPC + DB workloads memory-stall. SMT2 hides the stall at cost of per-thread IPC. Cloud scale-out is already parallel across cores; SMT would fight cache pressure.
16

References

Arm Ltd. — Neoverse TRMs (N1, V1, N2, V2, N3, V3) — freely downloadable on developer.arm.com
Arm Ltd.Neoverse Tech Day 2022 / 2024 keynotes and whitepapers
Arm Ltd.Arm Compute Sub-Systems (CSS) product briefs
AWSGraviton 2 / 3 / 4 performance whitepapers — aws.amazon.com/ec2/graviton
NVIDIANVIDIA Grace CPU superchip architecture whitepaper
ServeTheHome / Phoronix / Anandtech — independent Neoverse benchmark reviews (2019-2024)
Chipsandcheese.com — microarchitecture deep-dives on Neoverse N1/V1/V2
SiPearl / Jupiter EuroHPC — Rhea1 architecture papers

Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.