Nvidia's Blackwell GB200 NVL72 Is Reshaping the Data Center — and the AI Compute Race
Nvidia's Blackwell architecture is landing in hyperscale data centers at unprecedented scale, delivering up to 30x inference performance improvements over H100 for large language model workloads — and cementing Nvidia's grip on the AI compute stack.

Meet Deshani
Founder & Editor-in-Chief
Nvidia's Blackwell GPU generation is not an incremental upgrade. The GB200 NVL72 — a rack-scale unit containing 36 Grace CPUs and 72 Blackwell B200 GPUs linked by NVLink 5 — delivers inference performance for large language models that has changed the economics of AI deployment for every major hyperscaler.
At peak, the NVL72 achieves 1.4 exaFLOPS of FP4 tensor performance for inference workloads. For comparison, the H100 SXM5 delivers approximately 4 petaFLOPS for FP8 inference. The improvement is not merely from raw compute: NVLink 5's 1.8TB/s bidirectional bandwidth means the 72-GPU system behaves as a single logical accelerator for models up to approximately 14 trillion parameters, eliminating the inter-node communication bottlenecks that limited H100 cluster efficiency.
What This Means for LLM Inference Economics
The practical consequence is a dramatic reduction in per-token inference cost at scale. Early deployers report costs 60-70% below H100 deployments for equivalent throughput. For companies like OpenAI, Anthropic, and Google, whose inference costs run into hundreds of millions of dollars annually, this is not a marginal improvement — it is a structural shift in unit economics.