ARM NEOVERSE · PRESENTATION 05

The Ecosystem

Graviton · Ampere · Grace · Cobalt · Rhea · A64FX · Yitian · BlueField · Total Design

Hyperscalers · HPC · Telcos · DPUs · Custom-core licensees

Where Neoverse Actually Ships

Hyperscale cloud CPUs — AWS Graviton, Microsoft Cobalt, Google Axion, Alibaba Yitian, Oracle OCI Ampere A1
HPC / AI supercomputers — NVIDIA Grace, SiPearl Rhea, Fujitsu MONAKA, HPE ProLiant / Eviden Sequana
General-purpose server silicon — Ampere Altra, Altra Max, AmpereOne (custom), HiSilicon Kunpeng 920
DPUs & SmartNICs — NVIDIA BlueField (N1/N2), Marvell Octeon 10 (N2), AWS Nitro (N1)
5G + edge — Marvell Octeon Fusion, Qualcomm Cloud AI 100 host (N1), various 5G basebands (E1/E2)
Storage appliances — NetApp, Dell PowerStore, Pure Storage — all ship Arm-CPU backed appliances using Neoverse in 2024

Market shape (2024)

Arm server CPU share is ~15-20% by shipment volume (Omdia, Gartner estimates); ~50%+ of new EC2 capacity at AWS; Azure targets ~20% Cobalt by 2026.

Hyperscaler silicon self-sufficiency

AWS, Microsoft, Google, Alibaba all now design their own Neoverse-based server CPUs. Cuts Intel/AMD margin & aligns silicon roadmap with cloud workload trends.

AWS Graviton 1 → 4

Gen	Year	Cores	Core	Memory	PCIe
Graviton 1	2018	16 × Cortex-A72	A72 (pre-Neoverse)	8 × DDR4-2666	Gen 3
Graviton 2	2019	64 × N1	Neoverse N1	8 × DDR4-3200	Gen 4
Graviton 3	2022	64 × V1	Neoverse V1, SVE 2×256	8 × DDR5-4800	Gen 5
Graviton 3E	2022	64 × V1 (HPC tune)	V1, higher SVE util	8 × DDR5-4800	Gen 5
Graviton 4	2024	96 × V2	Neoverse V2	12 × DDR5-5600	Gen 5

Graviton 1 was a "prototype" — proved the boot & software stack. Graviton 2 was the commercial breakout. Graviton 3 introduced SVE + chiplet packaging. Graviton 4 is Arm's current hyperscale high-water mark with 96 V2 cores + ~192 MB SLC.

AWS chiplet strategy: Graviton 3/4 use a central I/O die + 7 compute chiplets. Yield + cost advantage over monolithic die at 3 nm/5 nm.

Ampere Altra → AmpereOne

Ampere Altra (Quicksilver, 2020) — 80 × N1 @ 3.0 GHz, 8 × DDR4-3200, PCIe Gen 4.
Ampere Altra Max (Mystique, 2021) — 128 × N1 @ 3.0-3.3 GHz. The densest single-socket N1 ever shipped.
AmpereOne "Siryn" (2023) — pivoted to Ampere's own custom-core Armv8.6-A (ex-Nuvia-style team). 192 cores, SMT-capable, 8 × DDR5-4800, PCIe Gen 5. Still Arm ISA but NOT Neoverse IP.
Deployment: Oracle OCI Ampere A1 Flex (Altra), Azure Dps_v6 (AmpereOne previews), Hetzner CAX, Equinix Metal.
Ampere was the first mover on Arm servers for commercial cloud — proof that Neoverse could ship outside AWS.

Why Ampere went custom

Cortex-A710 / N2 cores were late for AmpereOne's 2023 window, and Nuvia alumni wanted to prove a ground-up core could hit higher single-thread than N2. Result: AmpereOne reports ~1.2 × single-thread over N2 on SPECint at similar power.

Classic vs modern Ampere

Altra / Max = classic N1 cloud scale-out. AmpereOne = premium custom-core targeting Ampere's own customers (Oracle, smaller clouds). Two lines continue in parallel.

NVIDIA Grace & Grace-Hopper / Blackwell

Grace CPU: 72 × V2 cores per chiplet, 2 chiplets = 144 cores total, 480 GB LPDDR5X on-package.
Dual-socket Grace Superchip = 144 cores + 960 GB LPDDR5X + ~1 TB/s aggregate mem BW.
NVLink-C2C — 900 GB/s coherent link between Grace CPU and Hopper/Blackwell GPU. Allows GPU to address CPU LPDDR5X coherently.
Grace Hopper GH200 — CPU + H100 GPU in one SXM5 module. Used in HPE Cray EX255a, Eviden, Jupiter Booster.
Grace Blackwell GB200 (2024) — CPU + 2 × B200 GPUs per module, shipping in NVIDIA Quantum / Spectrum-X reference AI supercomputers.
Positioned as: "the CPU that unblocks the GPU" — high-bandwidth memory bridge for LLM-class workloads.

Microsoft Cobalt & Google Axion

Microsoft Cobalt 100 (2024) — 128 × N2 cores, CMN-700, DDR5-4800, custom Azure data-path IP, deployed at scale in Azure for "general purpose" VMs (Dpsv6/Epsv6 series).
Reported 40% better perf/$ than comparable x86 on Azure Functions + CosmosDB workloads.
Cobalt 200 (rumoured 2025) — N3 / V3-class on TSMC 3 nm via Arm CSS.
Google Axion (2024) — Google's first Arm server chip, based on Neoverse V2. Announced at Google Cloud Next 2024. Deployed into Hyperdisk storage tier + general-purpose VMs in limited regions.
Both Microsoft and Google cite the same reasons: perf/W at scale, roadmap control, reduced x86 dependency.

Why the sudden push

Power is the binding constraint on a 2024 datacentre — not capex, not land. Arm cores at 1-1.5 W/core let you fit more compute in the same MW envelope. Every 10% perf/W wins directly on capacity.

AWS is still ahead

AWS has been at this since 2018. Azure & Google are "Graviton 2-generation" in 2024 — about 4 years behind. But the catch-up is accelerating.

Alibaba Yitian 710 & Chinese Arm

Alibaba Yitian 710 (2021) — 128 × N2 cores @ 3.2 GHz, early adopter of N2, 8 × DDR5, PCIe 5. Deployed in Alibaba Cloud ECS g8y.
Historically notable: Yitian 710 beat Arm's own public N2 availability by ~12 months.
HiSilicon Kunpeng 920 (2019) — 64 × custom Armv8.2-A TaiShan V110 cores (similar perf to N1). Used by Huawei Cloud and (historically) by Chinese government servers.
Kunpeng 920 was affected by US Entity List restrictions on TSMC fabrication — HiSilicon pivoted to SMIC and designs have not been publicly refreshed since 2022.
Phytium D2000/S2500 — Chinese domestic Arm server chip, shipped to government / SOE.

Export controls

Arm IP itself is generally licensable to Chinese companies. The problem is fabrication access — TSMC / Samsung leading-node. Hence Chinese Neoverse customers constrained to older nodes (7-16 nm).

Alibaba as early Neoverse canary

Yitian 710 put 128 × N2 into production data centres before anyone else. Gave Arm early field-hardening for N2 → valuable for downstream Graviton 4 and Cobalt 100 customers.

HPC & Research: A64FX, SiPearl Rhea, Fujitsu MONAKA

Fujitsu A64FX (2019) — custom Armv8.2-A with SVE1 512-bit. 48 compute + 4 assistant cores per PE. HBM2.
Powered Fugaku (158,976 nodes, #1 Top500 2020-21). Proved Arm at supercomputing scale.
Fujitsu MONAKA (2025) — successor, Armv9.3-A, 150+ cores, SVE2, CXL 3.0, HBM3e. Targets the Fugaku-NEXT system.
SiPearl Rhea1 (2025) — 80 × V1 cores, HBM2e + DDR5, 14 nm/12 nm European fab (?), targets the JUPITER exascale system at Jülich.
ETRI / Korea KAIST — Neoverse-based research silicon for Korean HPC independence.
Takeaway: HPC is a prestige market + SVE / SVE2 proving ground. Commercial volumes are modest; publicity is large.

A64FX is still special

A64FX's 512-bit SVE1 is still the widest SVE anyone has shipped. Neoverse V1 went 2×256; V2/V3 went 4×128. No server-class successor to A64FX's 512-bit has shipped yet — though MONAKA may bring SVE2-at-scale.

Rhea1 — EU sovereignty

Europe's strategic answer to x86 dominance + TSMC dependency. Uses V1, not custom — SiPearl deliberately chose the Arm IP route to minimise risk.

DPUs & SmartNICs

NVIDIA BlueField-3 (2022) — 16 × A78 (not yet full Neoverse), 400 Gbps Ethernet, DDR5, OVS / storage virtualization offload.
NVIDIA BlueField-4 (2024) — Neoverse N2-based, double the cores, hardware path for the DOCA software stack.
Marvell Octeon 10 (2022) — up to 24 × N2 cores + crypto accelerators + ML engines. 400 GbE, 5G baseband target.
AWS Nitro — custom silicon running Annapurna-designed Arm cores (variously N1/N2 class). Offloads EC2 hypervisor control plane, storage, networking.
Function: move infra-plane work off the host CPU → tenant gets all host cycles, hypervisor runs on DPU.

The DPU thesis

As network speeds pushed past 100 Gbps, software networking on x86 cores became a tax (~30% CPU on a typical cloud VM host). DPUs took it back with Arm cores + purpose-built offload. Arm is the core IP of choice because perf/W is what matters on a PCIe card.

Neoverse N2 on DPU

Same core as Microsoft Cobalt 100, but clocked lower (2.0-2.5 GHz) and with much smaller L3. Used as "control plane" compute beside fixed-function accelerators.

Interactive — Explore a Neoverse Silicon

Graviton 2

Graviton 3

Graviton 4

Altra Max

AmpereOne

NVIDIA Grace

MS Cobalt 100

Yitian 710

SiPearl Rhea1

Fujitsu A64FX

The Custom-Core Detour

Company	Core	Launch	Status
Apple	Firestorm → Everest → Apple M4P	2013 (A7) →	Massive ongoing investment — laptops, phones, servers (internal)
Ampere	AmpereOne "Siryn" (192c)	2023	Commercial cloud server CPU
Qualcomm	Nuvia Oryon (from 2024)	2024	Laptops (X Elite), data centre (planned)
Marvell	ThunderX3 (cancelled)	2021 RIP	Marvell shifted to Neoverse N2 for Octeon 10
Huawei HiSilicon	TaiShan V110	2019	Affected by export controls; refreshes constrained
AWS	(internal to Graviton)	—	AWS designs Graviton but uses Neoverse IP cores — no custom core yet

Architectural Licensees get the Arm ISA; they implement their own microarchitecture. The advantage is product differentiation; the cost is ~3-5 years of CPU team build-out. Arm's CSS + Neoverse deliverables exist precisely so customers can skip that.

Arm Total Design & CSS (2024)

Launched Oct 2023. A program to make Neoverse-based chiplet design accessible to non-hyperscalers.
Bundles:
- Arm Compute Sub-System (CSS) — pre-integrated Neoverse + CMN + GIC + SMMU + CoreSight
- Partner PDK drops for TSMC, Samsung Foundry
- EDA flow templates (Cadence, Synopsys, Siemens)
- Package / UCIe IP from partners (Alphawave, Rambus)
First public CSS adopters: Microsoft Cobalt 200 (reported), Vector Silicon, Rebellions (Korean AI chip), Lexra.
Goal: bring Neoverse-silicon time-to-market down from ~3 years to ~12 months.

Why this matters commercially

Arm's 2023 IPO prospectus flagged that 75%+ royalty revenue is concentrated in 4-5 licensees. Total Design aims to broaden the Neoverse customer base to dozens of mid-size silicon companies.

CSS as a revenue reshape

Instead of per-core royalty on raw Neoverse IP, Arm now prices CSS as a subsystem. Higher per-chip fee, but saves customers millions in integration + verification.

Software Ecosystem — Post-2022 Maturity

OS distributions: Ubuntu, RHEL, Rocky, SUSE, Debian, Fedora all ship first-class arm64 release builds on the same cadence as x86-64.
Kernels: upstream Linux, mainlines Freebsd, OpenBSD all well-supported on Neoverse. Azure & AWS run vanilla upstream kernels + small cloud patches.
Runtimes: JVM (HotSpot, OpenJDK, GraalVM), .NET 8, Go, Node.js, Python, Rust, Swift all produce optimised AArch64 builds.
Databases: Postgres, MySQL, MariaDB, Redis, Memcached, ClickHouse, Cassandra, Kafka all tested and tuned for Neoverse.
ML frameworks: PyTorch + TorchInductor, ONNX Runtime, TensorFlow, JAX — all have mature Neoverse paths, with SVE2 used in inner loops.

The 2020 inflection

Around 2020-21 the software ecosystem flipped from "Arm support exists but it's a second-class citizen" to "Arm tested in CI, shipping as a first-class artefact". That's when Graviton went from "early adopter" to default choice.

The one soft spot

Legacy proprietary enterprise software — certain ERP, hedge-fund risk models, specific HPC ISV code — still lags on Arm. Gradually closing as Azure + Google add Arm VMs.

Where Arm Servers Are Going

Arm share of hyperscale CPU growing to 30-40% by 2026 (Omdia). AWS already >50% new capacity; Azure catching up with Cobalt; Google via Axion; Alibaba via Yitian.
Chiplets + UCIe universal — Neoverse CSS S3 + chiplet packaging turns server CPU design into mix-and-match.
CXL-enabled memory pooling — CMN S3 + CXL 3.0 mean a server can dynamically add/remove memory from a rack-level pool.
CCA / Realms go mainstream — cloud tenants using confidential computing on Arm by default on Azure + AWS by 2026.
CPU AI inference — N3 / V3 bf16/INT8 matmul competes with GPU for small model + edge inference.
On-device generative AI — SME on Cortex-X + Neoverse-class silicon in handsets with ~10 W budget.

The Arm hire pitch

If you can do Linux kernel arm64, performance analysis with Arm PMU/SPE, or silicon bring-up / SystemReady, you're employable by the hyperscalers, Arm, Ampere, NVIDIA, Marvell, or any one of dozens of CSS licensees.

Skills that transfer

Cortex-A knowledge transfers directly to Neoverse (same cores). SVE2 code runs identically on Cortex-X for phones and Neoverse V for servers. This shared microarchitecture is Arm's force multiplier.

Lessons

"What makes Graviton special?" → N1-class Arm cores, custom AWS silicon design, matches/beats x86 perf at ~60% of cost on cloud-native workloads. Now at Gen 4 with V2 cores.
"What is NVLink-C2C?" → NVIDIA's coherent CPU-GPU link on Grace Hopper. 900 GB/s. Allows GPU to address CPU LPDDR5X as if it were HBM.
"Difference between Ampere Altra and AmpereOne?" → Altra = Neoverse N1 (Arm IP, 80-128 cores). AmpereOne = custom Armv8.6-A core from ex-Nuvia team (192 cores).

"Who is first to N3 / V3?" → Microsoft Cobalt 200 and AWS Graviton 5 are reported to be on N3/V3 class. Both expected 2025.
"What does a DPU need Arm for?" → control plane compute alongside fixed-function network/storage offload. Neoverse N-class cores give perf/W that PCIe-card envelopes (~75 W) require.
"Why the A64FX special mention?" → custom Armv8.2-A, first-ever SVE implementation (512-bit), powered Fugaku. Proved Arm at supercomputing scale before Neoverse existed commercially.

References

AWS — Graviton 2 / 3 / 3E / 4 whitepapers at aws.amazon.com/ec2/graviton
NVIDIA — Grace + Grace Hopper architecture whitepaper (2023-24)
Microsoft Azure — Cobalt 100 announcement + launch blog posts (Ignite 2023 / 2024)
Google Cloud — Axion announcement, Google Cloud Next 2024
Ampere Computing — AmpereOne technical briefs; Altra Max product pages
Alibaba Cloud — Yitian 710 engineering announcement papers
Fujitsu — A64FX technology white paper (2019); MONAKA (2024 ISSCC paper)
SiPearl / EuroHPC — Rhea1 announcements; Jupiter exascale documentation
Arm Ltd. — Neoverse annual Tech Day transcripts; Arm Total Design / CSS press kits
ServeTheHome, Phoronix, Chipsandcheese — independent benchmarking + architecture write-ups

Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.