At NVIDIA GTC 2024 we got a preview of the next generation of high-performance GPUs with the announcement of the Blackwell architecture.
Let’s go through what the Blackwell architecture will give you from the perspective of AI training and inference use cases, and how it compares to the best NVIDIA GPUs currently on the market, the A100, H100 and H200.
NVIDIA Blackwell architecture
The Blackwell GPU is designed with the specific purpose of handling data center -scale generative AI workflows. Architecturally, Blackwell GPUs combine two reticle-limited dies into a single, unified GPU with a 10 terabyte-per-second chip-to-chip interface.
NVIDIA named their latest generation of high-performance GPU architecture in honor of the American mathematician and statistician David H. Blackwell
The Blackwell GPU is the largest GPU ever built with over 208 billion transistors. It also includes major software advancements, including the second-generation Transformer Engine and Confidential Computing capabilities aimed at supporting encrypted enterprise use of generative AI training, inference and federated learning.
What can we expect from the Blackwell architecture
5th Generation NVLink. 2x performance vs. previous NVLink generation and ability to connect up to 576 GPUs in one NVLink setup.
Confidential Computing. New encryption capabilities and ability to perform faster confidential AI training and inference.
2nd Generation Transformer Engine. NVIDIA plans a major update to the Transformer Engine library, including TensorRT-LLM and Nemo Frameworks.
Decompression Engine. New workflow can decompress data at a rate of up to 800GB/s to support accelerated data science and analytics.
RAS Engine. New Reliability, Availability and Serviceability (RAS) Engine to identify and diagnose potential faults.
Advanced Networking. Speeds up to 400 GB/sec using Quantum-2 InfiniBand and Spectrum X Ethernet.
HGX B100 and HGX B200
NVIDIA plans to release Blackwell GPUs with two different HGX AI supercomputing form factors, the B100 and B200. While these will share many of the same components, the B200 will have a higher maximum thermal design power (TDP) and overall higher performance across FP4 / FP8 / INT8 / FP16 / FP32 and FP64 workloads.
At the time of the announcement both the B100 and B200 were expected to have the same 192GB HBM3e memory with up to 8 TB/seconds of memory bandwidth.
B200 vs B100 Spec Sheet Comparison
Specification | HGX B200 | HGX B100 |
---|---|---|
Blackwell GPUs | 8 | 8 |
FP4 Tensor Core | 144 PFLOPS | 112 PFLOPS |
FP8/FP6/INT8 | 72 PFLOPS | 56 PFLOPS |
Fast Memory | Up to 1.5 TB | Up to 1.5TB |
Aggregate Memory Bandwidth | Up to 64 TB/s | Up to 64 TB/s |
Aggregate NVLink Bandwidth | 14.4 TB/s | 14.4 TB/s |
FP4 Tensor Core (per GPU) | 18 PFLOPS | 14 PFLOPS |
FP8/FP6 Tensor Core (per GPU) | 9 PFLOPS | 7 PFLOPS |
INT8 Tensor Core (per GPU) | 9 petaOPS | 7 petaOPs |
FP16/BF16 Tensor Core (per GPU) | 4.5 PFLOPS | 3.5 PFLOPS |
TF32 Tensor Core | 2.2 PFLOPS | 1.8 PFLOPS |
FP32 | 80 TFLOPS | 60 TFLOPS |
FP64 Tensor Core | 40 TFLOPS | 30 TFLOPS |
FP64 | 40 TFLOPS | 30 TFLOPS |
GPU memory / Bandwidth | Up to 192 GB HBM3e / Up to 8 TB/s | Up to 192 GB HBM3e / Up to 8 TB/s |
Max thermal design power (TDP) | 1,000W | 700W |
Interconnect | NVLink: 1.8TB/s PCIe Gen6: 256GB/s | NVLink: 1.8TB/s PCIe Gen6: 256GB/s |
Note: All petaFLOPS and petaOPS are with Sparsity except FP64 which is dense.
GB200 Grace Blackwell Superchip
In addition to the HGX form factors NVIDIA have announced a new Superchip combining two Blackwell Tensor Core GPUs and one NVIDIA Grace CPU. These Superchips can be connected in clusters, for example, in an NVL72 configuration connecting 36 Grace CPUs and 72 Blackwell GPUs into one massive GPU delivering 30x faster LLM inference than the H100.
*See more information on the H100 specs and performance.
Blackwell vs Hopper Comparison
Ahead of the launch of the Blackwell generation of GPUs NVIDIA have released benchmarks comparisons to the Hopper architecture.
In large model training (such as GPT-MoE-1.8T) the B200 is 3x faster than the h100.
The B200 also achieves up to 15x higher inference performance compared to the H100 using large models such as GPT-MoE-1.8T.
Let’s still remember that before the Blackwell series is released, Hopper architecture will get a serious upgrade in the form of the H200 specs.
GB200 vs H100 Benchmarks
NVIDIA have released benchmark data comparing the GB200 Superchip to the NVIDIA H100.
The GB200 will have 30x speedup on resource-intensive applications compared to the H100.
The Cadence SpectreX simulations are expected to run 13x quicker on GB200 than the H100.
The GB200 is expected to have up to 25x better energy efficiency leading to a 25x lower total cost of ownership than the H100.
NVIDIA B200 vs. AMD MI300X
Another major point of comparison is with the MI300X recently released by AMD. While NVIDIA have a very clear leadership in the market for AI-focused GPUs, AMD and other major players have brought competing products to the market.
While the MI300X is already available in the market today, the best comparison is likely to be with the B200 with both high-performance GPUs seeing more availability by early 2025.
Specification | NVIDIA B200 | AMD MI300X |
---|---|---|
GPU Memory | 192 GB | 192 GB |
Memory Type | HBM3e | HBM3 |
Peak Memory Bandwidth | 8 TB/s | 5.3TB/s |
Interconnect | NVLink: 1.8TB/s, PCIe Gen6: 256GB/s | PCIe Gen5: 128GB/s |
Max Thermal Design Power (TDP) | 1,000W | 750W |
Naturally, you can’t compare NVIDIA and AMD GPUs just on hardware specs. NVIDIA still holds a massive edge when it comes to software with many AI engineers finding it difficult to move away from CUDA and other frameworks.
Availability and pricing
Don’t expect to get access to these high-performance devices any time soon. The earliest you can expect B100 to be available is in Q4 of 2024 and the B200 is not likely to be available before 2025. No release date is announced for the GB200.
Also, NVIDIA have not released any pricing information.
You don’t need to wait to get access to high-performance GPUs. DataCrunch offers a broad range of premium NVIDIA GPUs at competitive prices. See latest cloud GPU pricing and availability.
Bottom line on Blackwell architecture
Competition is increasing in the GPU race and NVIDIA is not looking to rest on their laurels. An early release of specs for the Blackwell architecture series confirms that NVIDIA is likely to give you the highest performance GPUs today and in years to come.
It may be quite some time before you get access to the B100, B200 or the GB200, but when they do arrive you can expect DataCrunch to give you quick access, fair pricing and in-depth performance benchmarks.