Technical Evangelist // AI Arcanist
Selecting a cloud GPU can be a rigorous experience. There are numerous factors to consider, each in turn affecting one another, that make it a surprisingly difficult choice when starting a Deep Learning project. In particular, we assert that the user must always consider their compute requirements, such as the quantity of VRAM and the various throughput specifications, and the corresponding cost.
For the last year, the most powerful NVIDIA GPU available on the DigitalOcean Gradient™ AI Agentic Cloud has been the NVIDIA H100. Released in late 2022, the NVIDIA H100 has been one of the driving engines behind the current AI revolution. But it was quickly followed by a powerful sister machine: the NVIDIA H200. Beginning to ship in 2024, the NVIDIA HGX H200 quickly supplanted its sibling as the most powerful GPU for AI on the market.
We are excited to announce that NVIDIA HGX H200s are now available as a GPU Droplet! With that announcement, we want to provide this follow-up to our last article, “What is an H100?”, where we expand on the capabilities of these machines, examine the technical specifications of NVIDIA H200 GPUs, and discuss our beliefs about when to choose this GPU for your Deep Learning task.
Follow along for an in-depth look at the H200, and get ready to use them for your own projects on DigitalOcean!
The Graphics Processing Unit, or GPU, is the engine that powers the AI revolution unfolding in front of us. In practice, these machines do the innumerable calculations and computations that together create Deep Learning model’s inference and training.
While the exact components that allow this to happen vary from different producers of GPUs, the core functionality of them remains the same. Modern GPUs typically contain a number of multiprocessor computational units. Each has a shared memory block, plus a number of processors and corresponding registers. The GPU itself has constant memory, plus device memory on the board it is housed on. For example, NVIDIA GPUs perform their calculations across computational objects within the GPU called CUDA Cores. Each of these cores is capable of performing a computation in parallel with the other cores. This orchestration, in turn, allows for the performance of machine learning to detect and learn from patterns in data based on these calculations.
Let’s take a look again at the new features in Hopper GPUs, originally discussed in our NVIDIA H100 writeup. There are a number of notable upgrades to the Hopper Microarchitecture, including improvements to the Tensor Core technology, the introduction of the Transformation Engine, and much more. Let’s look more closely at some of the more noticeable upgrades.
In the Hopper Microarchitecture, the Fourth-Generation Tensor Cores represent perhaps the most significant advancement for deep learning and AI practitioners, delivering performance gains of up to 60× compared to the Ampere Tensor Core generation. Central to this leap is NVIDIA’s new Transformer Engine, a specialized component within each Tensor Core. Designed specifically to accelerate transformer-based models, it enables computations to adapt dynamically between FP8 and FP16 precision, maximizing both speed and efficiency.
Because Tensor Core FLOPs in FP8 are twice that of 16-bit operations, running deep learning models in these formats offers greater efficiency and cost savings. The trade-off, however, is a potential drop in numerical precision. NVIDIA’s Transformer Engine addresses this challenge by compensating for the precision loss inherent in FP8 while still delivering the high throughput of FP16. It achieves this by intelligently switching between FP8 and FP16 on a per-layer basis, ensuring both performance and accuracy are preserved. As reported by NVIDIA, “the NVIDIA Hopper architecture in particular also advances fourth-generation Tensor Cores by tripling the floating-point operations per second compared with prior-generation TF32, FP64, FP16 and INT8 precisions” (Source).
MIG or Multi Instance GPU is the technology that allows for a single GPU to be partitioned into fully contained and isolated instances, with their own memory, cache, and compute cores (Source). In H100s, second generation MIG technology enhances this even further by enabling the GPU to be split into seven, secure GPU instances with multi-tenant, multi-user configurations in virtual environments. In deployment, this architecture enables multi-tenant GPU sharing with strong hardware-enforced isolation, a critical requirement for secure cloud operations. Each GPU instance is provisioned with dedicated video decoders, which generate intelligent video analytics (IVA) telemetry on the shared infrastructure and stream it directly to monitoring pipelines. Leveraging Hopper’s concurrent Multi-Instance GPU (MIG) profiling, administrators can perform fine-grained tracking of utilization metrics and dynamically optimize resource partitioning across workloads, ensuring both performance consistency and operational efficiency. Source
NVLink and NVSwitch are the NVIDIA GPU technologies that facilitate the connection of multiple GPUs to one another in an integrated system. NVLink is the bidirectional interconnect hardware that allows GPUs to share data with one another, and NVSwitch is a chip that facilitates the connections between different machines in a multi-GPU system by connecting the NVLink interconnect interfaces to the GPUs. With each generation of microarchitecture, this technology has been improved upon since their release. In H100s, Fourth-generation NVLink effectively scales multi-instance GPU IO interactions up to 900 gigabytes per second (GB/s) bidirectional per GPU, which is estimated to be over 7X the bandwidth of PCIe Gen5 (Source). This means that GPUs are able to input and output information to one another at significantly higher speeds than was possible with Ampere, and this innovation is responsible for many of the reported speed ups being offered by H100 multi-GPU systems in marketing materials. Next, Third-generation NVIDIA NVSwitch supports Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) in-network computing, and provides a 2X increase in all-reduce throughput within eight H100 GPU servers compared to the previous-generation A100 Tensor Core GPU systems (Source). In practical terms, this means that the newest generation of NVSwitch is able to more effectively and efficiently oversee the operations across the multi-GPU system, allocate resources where needed, and increase throughput dramatically on DGX systems.
A common concern in the era of Big Data is security. While data is often stored or transferred in encrypted formats, this provides no protection against bad actors who can access the data while being processed. With the release of the Hopper microarchitecture, NVIDIA introduced a novel solution to this problem: Confidential Computing. This effectively removes much of the risk of data being stolen during processing by creating a physical data space where workloads are processed independently of the rest of the computer system. By processing all the workload in the inaccessible, trusted execution environment, it makes it much more difficult to access the protected data.
Specification | NVIDIA H100 SXM | NVIDIA H200 |
---|---|---|
Form Factor | SXM | SXM |
FP64 | 34 TFLOPS | 34 TFLOPS |
FP64 Tensor Core | 67 TFLOPS | 67 TFLOPS |
FP32 | 67 TFLOPS | 67 TFLOPS |
TF32 Tensor Core | 989 TFLOPS | 989 TFLOPS |
BFLOAT16 Tensor Core | 1,979 TFLOPS | 1,979 TFLOPS |
FP16 Tensor Core | 1,979 TFLOPS | 1,979 TFLOPS |
FP8 Tensor Core | 3,958 TFLOPS | 3,958 TFLOPS |
INT8 Tensor Core | 3,958 TOPS | 3,958 TOPS |
GPU Memory | 80GB | 141GB |
GPU Memory Bandwidth | 3.35 TB/s | 4.8 TB/s |
Decoders | 7 NVDEC, 7 JPEG | 7 NVDEC, 7 JPEG |
Max Thermal Design Power | Up to 700W | Up to 1000W |
Multi-Instance GPUs | Up to 7 MIGs @ 10GB ea | Up to 7 MIGs @ 16.5GB ea |
Interconnect | NVLink: 900GB/s, PCIe Gen5: 128GB/s | NVLink: 900GB/s, PCIe Gen5: 128GB/s |
Above we can see a direct, head-to-head comparison of key specifications of the NVIDIA H100 and H200 GPUs. This table succinctly describes the differences between the two machines.
First, we should call attention to the fact that the measure of the computations the machine can do per second, at any computer number precision, is the same between each machine. This means that the H200 is not objectively faster at making the computations.
Second, GPU memory bandwidth, or the maximum rate at which data can be transferred between the GPU’s processing cores and its memory (VRAM), is significantly higher in the H200. This means that, while the calculations are not performed necessarily faster, the information is transferred more quickly from the CUDA cores to the GPU.
Finally, the GPU memory is significantly larger in the NVIDIA H200. This means much larger models can be loaded onto the machine, and larger batch sizes can be used for inference and training. In practice, this makes the difference between loading a full-precision model and a GGUF Quantization of the model at a lower precision.
Selecting the best GPU for a given task can be a daunting challenge. There are numerous options available from different providers, from different generations, and with different computational specifications. The most important thing to consider at the end of the day, however, is the use case itself. Are we training a particularly large model? Do we need large batch sizes for inference? How long do we have to run the training, is there a time crunch? Asking these questions can give us more insight than anything. All of these questions have the same final answer: to use the NVIDIA H200
At the end of the day, the final question of efficiency comes down to cost. The H200 is the most price efficient GPU because of its incredible throughput and memory, making it always the better option to the NVIDIA H100. It is truly an upgrade in every way.
The NVIDIA H200 is the most powerful GPU on the DigitalOcean Gradienttm AI Agentic Cloud. The capabilities offered by this machine are already powering the AI revolution unfolding in front of us. Thanks to its advancements over previous microarchitectures & comparing the cost and efficiency versus the H100, the NVIDIA HGX H200 is our recommendation for any training or inference project.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.