5G + edge — Marvell Octeon Fusion, Qualcomm Cloud AI 100 host (N1), various 5G basebands (E1/E2)
Storage appliances — NetApp, Dell PowerStore, Pure Storage — all ship Arm-CPU backed appliances using Neoverse in 2024
Market shape (2024)
Arm server CPU share is ~15-20% by shipment volume (Omdia, Gartner estimates); ~50%+ of new EC2 capacity at AWS; Azure targets ~20% Cobalt by 2026.
Hyperscaler silicon self-sufficiency
AWS, Microsoft, Google, Alibaba all now design their own Neoverse-based server CPUs. Cuts Intel/AMD margin & aligns silicon roadmap with cloud workload trends.
03
AWS Graviton 1 → 4
Gen
Year
Cores
Core
Memory
PCIe
Graviton 1
2018
16 × Cortex-A72
A72 (pre-Neoverse)
8 × DDR4-2666
Gen 3
Graviton 2
2019
64 × N1
Neoverse N1
8 × DDR4-3200
Gen 4
Graviton 3
2022
64 × V1
Neoverse V1, SVE 2×256
8 × DDR5-4800
Gen 5
Graviton 3E
2022
64 × V1 (HPC tune)
V1, higher SVE util
8 × DDR5-4800
Gen 5
Graviton 4
2024
96 × V2
Neoverse V2
12 × DDR5-5600
Gen 5
Graviton 1 was a "prototype" — proved the boot & software stack. Graviton 2 was the commercial breakout. Graviton 3 introduced SVE + chiplet packaging. Graviton 4 is Arm's current hyperscale high-water mark with 96 V2 cores + ~192 MB SLC.
AWS chiplet strategy: Graviton 3/4 use a central I/O die + 7 compute chiplets. Yield + cost advantage over monolithic die at 3 nm/5 nm.
04
Ampere Altra → AmpereOne
Ampere Altra (Quicksilver, 2020) — 80 × N1 @ 3.0 GHz, 8 × DDR4-3200, PCIe Gen 4.
Ampere Altra Max (Mystique, 2021) — 128 × N1 @ 3.0-3.3 GHz. The densest single-socket N1 ever shipped.
AmpereOne "Siryn" (2023) — pivoted to Ampere's own custom-core Armv8.6-A (ex-Nuvia-style team). 192 cores, SMT-capable, 8 × DDR5-4800, PCIe Gen 5. Still Arm ISA but NOT Neoverse IP.
Ampere was the first mover on Arm servers for commercial cloud — proof that Neoverse could ship outside AWS.
Why Ampere went custom
Cortex-A710 / N2 cores were late for AmpereOne's 2023 window, and Nuvia alumni wanted to prove a ground-up core could hit higher single-thread than N2. Result: AmpereOne reports ~1.2 × single-thread over N2 on SPECint at similar power.
Classic vs modern Ampere
Altra / Max = classic N1 cloud scale-out. AmpereOne = premium custom-core targeting Ampere's own customers (Oracle, smaller clouds). Two lines continue in parallel.
NVLink-C2C — 900 GB/s coherent link between Grace CPU and Hopper/Blackwell GPU. Allows GPU to address CPU LPDDR5X coherently.
Grace Hopper GH200 — CPU + H100 GPU in one SXM5 module. Used in HPE Cray EX255a, Eviden, Jupiter Booster.
Grace Blackwell GB200 (2024) — CPU + 2 × B200 GPUs per module, shipping in NVIDIA Quantum / Spectrum-X reference AI supercomputers.
Positioned as: "the CPU that unblocks the GPU" — high-bandwidth memory bridge for LLM-class workloads.
06
Microsoft Cobalt & Google Axion
Microsoft Cobalt 100 (2024) — 128 × N2 cores, CMN-700, DDR5-4800, custom Azure data-path IP, deployed at scale in Azure for "general purpose" VMs (Dpsv6/Epsv6 series).
Reported 40% better perf/$ than comparable x86 on Azure Functions + CosmosDB workloads.
Cobalt 200 (rumoured 2025) — N3 / V3-class on TSMC 3 nm via Arm CSS.
Google Axion (2024) — Google's first Arm server chip, based on Neoverse V2. Announced at Google Cloud Next 2024. Deployed into Hyperdisk storage tier + general-purpose VMs in limited regions.
Both Microsoft and Google cite the same reasons: perf/W at scale, roadmap control, reduced x86 dependency.
Why the sudden push
Power is the binding constraint on a 2024 datacentre — not capex, not land. Arm cores at 1-1.5 W/core let you fit more compute in the same MW envelope. Every 10% perf/W wins directly on capacity.
AWS is still ahead
AWS has been at this since 2018. Azure & Google are "Graviton 2-generation" in 2024 — about 4 years behind. But the catch-up is accelerating.
07
Alibaba Yitian 710 & Chinese Arm
Alibaba Yitian 710 (2021) — 128 × N2 cores @ 3.2 GHz, early adopter of N2, 8 × DDR5, PCIe 5. Deployed in Alibaba Cloud ECS g8y.
Historically notable: Yitian 710 beat Arm's own public N2 availability by ~12 months.
HiSilicon Kunpeng 920 (2019) — 64 × custom Armv8.2-A TaiShan V110 cores (similar perf to N1). Used by Huawei Cloud and (historically) by Chinese government servers.
Kunpeng 920 was affected by US Entity List restrictions on TSMC fabrication — HiSilicon pivoted to SMIC and designs have not been publicly refreshed since 2022.
Phytium D2000/S2500 — Chinese domestic Arm server chip, shipped to government / SOE.
Export controls
Arm IP itself is generally licensable to Chinese companies. The problem is fabrication access — TSMC / Samsung leading-node. Hence Chinese Neoverse customers constrained to older nodes (7-16 nm).
Alibaba as early Neoverse canary
Yitian 710 put 128 × N2 into production data centres before anyone else. Gave Arm early field-hardening for N2 → valuable for downstream Graviton 4 and Cobalt 100 customers.
SiPearl Rhea1 (2025) — 80 × V1 cores, HBM2e + DDR5, 14 nm/12 nm European fab (?), targets the JUPITER exascale system at Jülich.
ETRI / Korea KAIST — Neoverse-based research silicon for Korean HPC independence.
Takeaway: HPC is a prestige market + SVE / SVE2 proving ground. Commercial volumes are modest; publicity is large.
A64FX is still special
A64FX's 512-bit SVE1 is still the widest SVE anyone has shipped. Neoverse V1 went 2×256; V2/V3 went 4×128. No server-class successor to A64FX's 512-bit has shipped yet — though MONAKA may bring SVE2-at-scale.
Rhea1 — EU sovereignty
Europe's strategic answer to x86 dominance + TSMC dependency. Uses V1, not custom — SiPearl deliberately chose the Arm IP route to minimise risk.
NVIDIA BlueField-4 (2024) — Neoverse N2-based, double the cores, hardware path for the DOCA software stack.
Marvell Octeon 10 (2022) — up to 24 × N2 cores + crypto accelerators + ML engines. 400 GbE, 5G baseband target.
AWS Nitro — custom silicon running Annapurna-designed Arm cores (variously N1/N2 class). Offloads EC2 hypervisor control plane, storage, networking.
Function: move infra-plane work off the host CPU → tenant gets all host cycles, hypervisor runs on DPU.
The DPU thesis
As network speeds pushed past 100 Gbps, software networking on x86 cores became a tax (~30% CPU on a typical cloud VM host). DPUs took it back with Arm cores + purpose-built offload. Arm is the core IP of choice because perf/W is what matters on a PCIe card.
Neoverse N2 on DPU
Same core as Microsoft Cobalt 100, but clocked lower (2.0-2.5 GHz) and with much smaller L3. Used as "control plane" compute beside fixed-function accelerators.
Affected by export controls; refreshes constrained
AWS
(internal to Graviton)
—
AWS designs Graviton but uses Neoverse IP cores — no custom core yet
Architectural Licensees get the Arm ISA; they implement their own microarchitecture. The advantage is product differentiation; the cost is ~3-5 years of CPU team build-out. Arm's CSS + Neoverse deliverables exist precisely so customers can skip that.
12
Arm Total Design & CSS (2024)
Launched Oct 2023. A program to make Neoverse-based chiplet design accessible to non-hyperscalers.
Package / UCIe IP from partners (Alphawave, Rambus)
First public CSS adopters: Microsoft Cobalt 200 (reported), Vector Silicon, Rebellions (Korean AI chip), Lexra.
Goal: bring Neoverse-silicon time-to-market down from ~3 years to ~12 months.
Why this matters commercially
Arm's 2023 IPO prospectus flagged that 75%+ royalty revenue is concentrated in 4-5 licensees. Total Design aims to broaden the Neoverse customer base to dozens of mid-size silicon companies.
CSS as a revenue reshape
Instead of per-core royalty on raw Neoverse IP, Arm now prices CSS as a subsystem. Higher per-chip fee, but saves customers millions in integration + verification.
13
Software Ecosystem — Post-2022 Maturity
OS distributions: Ubuntu, RHEL, Rocky, SUSE, Debian, Fedora all ship first-class arm64 release builds on the same cadence as x86-64.
Kernels: upstream Linux, mainlines Freebsd, OpenBSD all well-supported on Neoverse. Azure & AWS run vanilla upstream kernels + small cloud patches.
Runtimes: JVM (HotSpot, OpenJDK, GraalVM), .NET 8, Go, Node.js, Python, Rust, Swift all produce optimised AArch64 builds.
Databases: Postgres, MySQL, MariaDB, Redis, Memcached, ClickHouse, Cassandra, Kafka all tested and tuned for Neoverse.
ML frameworks: PyTorch + TorchInductor, ONNX Runtime, TensorFlow, JAX — all have mature Neoverse paths, with SVE2 used in inner loops.
The 2020 inflection
Around 2020-21 the software ecosystem flipped from "Arm support exists but it's a second-class citizen" to "Arm tested in CI, shipping as a first-class artefact". That's when Graviton went from "early adopter" to default choice.
The one soft spot
Legacy proprietary enterprise software — certain ERP, hedge-fund risk models, specific HPC ISV code — still lags on Arm. Gradually closing as Azure + Google add Arm VMs.
14
Where Arm Servers Are Going
Arm share of hyperscale CPU growing to 30-40% by 2026 (Omdia). AWS already >50% new capacity; Azure catching up with Cobalt; Google via Axion; Alibaba via Yitian.
Chiplets + UCIe universal — Neoverse CSS S3 + chiplet packaging turns server CPU design into mix-and-match.
CXL-enabled memory pooling — CMN S3 + CXL 3.0 mean a server can dynamically add/remove memory from a rack-level pool.
CCA / Realms go mainstream — cloud tenants using confidential computing on Arm by default on Azure + AWS by 2026.
CPU AI inference — N3 / V3 bf16/INT8 matmul competes with GPU for small model + edge inference.
On-device generative AI — SME on Cortex-X + Neoverse-class silicon in handsets with ~10 W budget.
The Arm hire pitch
If you can do Linux kernel arm64, performance analysis with Arm PMU/SPE, or silicon bring-up / SystemReady, you're employable by the hyperscalers, Arm, Ampere, NVIDIA, Marvell, or any one of dozens of CSS licensees.
Skills that transfer
Cortex-A knowledge transfers directly to Neoverse (same cores). SVE2 code runs identically on Cortex-X for phones and Neoverse V for servers. This shared microarchitecture is Arm's force multiplier.
15
Lessons
"What makes Graviton special?" → N1-class Arm cores, custom AWS silicon design, matches/beats x86 perf at ~60% of cost on cloud-native workloads. Now at Gen 4 with V2 cores.
"What is NVLink-C2C?" → NVIDIA's coherent CPU-GPU link on Grace Hopper. 900 GB/s. Allows GPU to address CPU LPDDR5X as if it were HBM.
"Difference between Ampere Altra and AmpereOne?" → Altra = Neoverse N1 (Arm IP, 80-128 cores). AmpereOne = custom Armv8.6-A core from ex-Nuvia team (192 cores).
"Who is first to N3 / V3?" → Microsoft Cobalt 200 and AWS Graviton 5 are reported to be on N3/V3 class. Both expected 2025.
"What does a DPU need Arm for?" → control plane compute alongside fixed-function network/storage offload. Neoverse N-class cores give perf/W that PCIe-card envelopes (~75 W) require.
"Why the A64FX special mention?" → custom Armv8.2-A, first-ever SVE implementation (512-bit), powered Fugaku. Proved Arm at supercomputing scale before Neoverse existed commercially.
16
References
AWS — Graviton 2 / 3 / 3E / 4 whitepapers at aws.amazon.com/ec2/graviton NVIDIA — Grace + Grace Hopper architecture whitepaper (2023-24) Microsoft Azure — Cobalt 100 announcement + launch blog posts (Ignite 2023 / 2024) Google Cloud — Axion announcement, Google Cloud Next 2024 Ampere Computing — AmpereOne technical briefs; Altra Max product pages Alibaba Cloud — Yitian 710 engineering announcement papers Fujitsu — A64FX technology white paper (2019); MONAKA (2024 ISSCC paper) SiPearl / EuroHPC — Rhea1 announcements; Jupiter exascale documentation Arm Ltd. — Neoverse annual Tech Day transcripts; Arm Total Design / CSS press kits ServeTheHome, Phoronix, Chipsandcheese — independent benchmarking + architecture write-ups
Presentation built with Reveal.js 4.6 · Playfair Display + DM Sans + JetBrains Mono
Educational use.