What is the NVIDIA HGX H200?

Published on September 3, 2025

Technical Evangelist // AI Arcanist

Selecting a cloud GPU can be a rigorous experience. There are numerous factors to consider, each in turn affecting one another, that make it a surprisingly difficult choice when starting a Deep Learning project. In particular, we assert that the user must always consider their compute requirements, such as the quantity of VRAM and the various throughput specifications, and the corresponding cost.

For the last year, the most powerful NVIDIA GPU available on the DigitalOcean Gradient™ AI Agentic Cloud has been the NVIDIA H100. Released in late 2022, the NVIDIA H100 has been one of the driving engines behind the current AI revolution. But it was quickly followed by a powerful sister machine: the NVIDIA H200. Beginning to ship in 2024, the NVIDIA HGX H200 quickly supplanted its sibling as the most powerful GPU for AI on the market.

We are excited to announce that NVIDIA HGX H200s are now available as a GPU Droplet! With that announcement, we want to provide this follow-up to our last article, “What is an H100?”, where we expand on the capabilities of these machines, examine the technical specifications of NVIDIA H200 GPUs, and discuss our beliefs about when to choose this GPU for your Deep Learning task.

Follow along for an in-depth look at the H200, and get ready to use them for your own projects on DigitalOcean!

Key Takeaways

The NVIDIA H200 is the most powerful NVIDIA GPU on the DigitalOcean Gradient™ AI Agentic Cloud, out classing the previous top model, the H100. The H200 has approximately 1.4x the throughput of the H100, and ~1.8 times the VRAM capacity
Upgrades to the NVIDIA Hopper Microarchitecture from Ampere made massive strides in GPU technology, with more advanced Tensor Cores, the Transformer Engine to empower LLMs, advanced Multi-instance GPU connection, and much more
The H200 is the best NVIDIA GPU choice for both inference and model training, especially when considering the cost of running these machines at scale

NVIDIA GPUs for Deep Learning

The Graphics Processing Unit, or GPU, is the engine that powers the AI revolution unfolding in front of us. In practice, these machines do the innumerable calculations and computations that together create Deep Learning model’s inference and training.

While the exact components that allow this to happen vary from different producers of GPUs, the core functionality of them remains the same. Modern GPUs typically contain a number of multiprocessor computational units. Each has a shared memory block, plus a number of processors and corresponding registers. The GPU itself has constant memory, plus device memory on the board it is housed on. For example, NVIDIA GPUs perform their calculations across computational objects within the GPU called CUDA Cores. Each of these cores is capable of performing a computation in parallel with the other cores. This orchestration, in turn, allows for the performance of machine learning to detect and learn from patterns in data based on these calculations.

New Features in Hopper GPUs

Let’s take a look again at the new features in Hopper GPUs, originally discussed in our NVIDIA H100 writeup. There are a number of notable upgrades to the Hopper Microarchitecture, including improvements to the Tensor Core technology, the introduction of the Transformation Engine, and much more. Let’s look more closely at some of the more noticeable upgrades.

Fourth-Generation Tensor Cores with the Transformer Engine

In the Hopper Microarchitecture, the Fourth-Generation Tensor Cores represent perhaps the most significant advancement for deep learning and AI practitioners, delivering performance gains of up to 60× compared to the Ampere Tensor Core generation. Central to this leap is NVIDIA’s new Transformer Engine, a specialized component within each Tensor Core. Designed specifically to accelerate transformer-based models, it enables computations to adapt dynamically between FP8 and FP16 precision, maximizing both speed and efficiency.

Because Tensor Core FLOPs in FP8 are twice that of 16-bit operations, running deep learning models in these formats offers greater efficiency and cost savings. The trade-off, however, is a potential drop in numerical precision. NVIDIA’s Transformer Engine addresses this challenge by compensating for the precision loss inherent in FP8 while still delivering the high throughput of FP16. It achieves this by intelligently switching between FP8 and FP16 on a per-layer basis, ensuring both performance and accuracy are preserved. As reported by NVIDIA, “the NVIDIA Hopper architecture in particular also advances fourth-generation Tensor Cores by tripling the floating-point operations per second compared with prior-generation TF32, FP64, FP16 and INT8 precisions” (Source).

Second-Generation Secure MIG

MIG or Multi Instance GPU is the technology that allows for a single GPU to be partitioned into fully contained and isolated instances, with their own memory, cache, and compute cores (Source). In H100s, second generation MIG technology enhances this even further by enabling the GPU to be split into seven, secure GPU instances with multi-tenant, multi-user configurations in virtual environments. In deployment, this architecture enables multi-tenant GPU sharing with strong hardware-enforced isolation, a critical requirement for secure cloud operations. Each GPU instance is provisioned with dedicated video decoders, which generate intelligent video analytics (IVA) telemetry on the shared infrastructure and stream it directly to monitoring pipelines. Leveraging Hopper’s concurrent Multi-Instance GPU (MIG) profiling, administrators can perform fine-grained tracking of utilization metrics and dynamically optimize resource partitioning across workloads, ensuring both performance consistency and operational efficiency. Source

Fourth-Generation NVLink & Third-Generation NVSwitch

NVLink and NVSwitch are the NVIDIA GPU technologies that facilitate the connection of multiple GPUs to one another in an integrated system. NVLink is the bidirectional interconnect hardware that allows GPUs to share data with one another, and NVSwitch is a chip that facilitates the connections between different machines in a multi-GPU system by connecting the NVLink interconnect interfaces to the GPUs. With each generation of microarchitecture, this technology has been improved upon since their release. In H100s, Fourth-generation NVLink effectively scales multi-instance GPU IO interactions up to 900 gigabytes per second (GB/s) bidirectional per GPU, which is estimated to be over 7X the bandwidth of PCIe Gen5 (Source). This means that GPUs are able to input and output information to one another at significantly higher speeds than was possible with Ampere, and this innovation is responsible for many of the reported speed ups being offered by H100 multi-GPU systems in marketing materials. Next, Third-generation NVIDIA NVSwitch supports Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) in-network computing, and provides a 2X increase in all-reduce throughput within eight H100 GPU servers compared to the previous-generation A100 Tensor Core GPU systems (Source). In practical terms, this means that the newest generation of NVSwitch is able to more effectively and efficiently oversee the operations across the multi-GPU system, allocate resources where needed, and increase throughput dramatically on DGX systems.

Confidential Computing

A common concern in the era of Big Data is security. While data is often stored or transferred in encrypted formats, this provides no protection against bad actors who can access the data while being processed. With the release of the Hopper microarchitecture, NVIDIA introduced a novel solution to this problem: Confidential Computing. This effectively removes much of the risk of data being stolen during processing by creating a physical data space where workloads are processed independently of the rest of the computer system. By processing all the workload in the inaccessible, trusted execution environment, it makes it much more difficult to access the protected data.

NVIDIA HGX H200 Technical Specifications

Specification	NVIDIA H100 SXM	NVIDIA H200
Form Factor	SXM	SXM
FP64	34 TFLOPS	34 TFLOPS
FP64 Tensor Core	67 TFLOPS	67 TFLOPS
FP32	67 TFLOPS	67 TFLOPS
TF32 Tensor Core	989 TFLOPS	989 TFLOPS
BFLOAT16 Tensor Core	1,979 TFLOPS	1,979 TFLOPS
FP16 Tensor Core	1,979 TFLOPS	1,979 TFLOPS
FP8 Tensor Core	3,958 TFLOPS	3,958 TFLOPS
INT8 Tensor Core	3,958 TOPS	3,958 TOPS
GPU Memory	80GB	141GB
GPU Memory Bandwidth	3.35 TB/s	4.8 TB/s
Decoders	7 NVDEC, 7 JPEG	7 NVDEC, 7 JPEG
Max Thermal Design Power	Up to 700W	Up to 1000W
Multi-Instance GPUs	Up to 7 MIGs @ 10GB ea	Up to 7 MIGs @ 16.5GB ea
Interconnect	NVLink: 900GB/s, PCIe Gen5: 128GB/s	NVLink: 900GB/s, PCIe Gen5: 128GB/s

Above we can see a direct, head-to-head comparison of key specifications of the NVIDIA H100 and H200 GPUs. This table succinctly describes the differences between the two machines.

First, we should call attention to the fact that the measure of the computations the machine can do per second, at any computer number precision, is the same between each machine. This means that the H200 is not objectively faster at making the computations.

Second, GPU memory bandwidth, or the maximum rate at which data can be transferred between the GPU’s processing cores and its memory (VRAM), is significantly higher in the H200. This means that, while the calculations are not performed necessarily faster, the information is transferred more quickly from the CUDA cores to the GPU.

Finally, the GPU memory is significantly larger in the NVIDIA H200. This means much larger models can be loaded onto the machine, and larger batch sizes can be used for inference and training. In practice, this makes the difference between loading a full-precision model and a GGUF Quantization of the model at a lower precision.

When to choose the NVIDIA HGX H200

Selecting the best GPU for a given task can be a daunting challenge. There are numerous options available from different providers, from different generations, and with different computational specifications. The most important thing to consider at the end of the day, however, is the use case itself. Are we training a particularly large model? Do we need large batch sizes for inference? How long do we have to run the training, is there a time crunch? Asking these questions can give us more insight than anything. All of these questions have the same final answer: to use the NVIDIA H200

At the end of the day, the final question of efficiency comes down to cost. The H200 is the most price efficient GPU because of its incredible throughput and memory, making it always the better option to the NVIDIA H100. It is truly an upgrade in every way.

Closing Thoughts

The NVIDIA H200 is the most powerful GPU on the DigitalOcean Gradient AI Agentic Cloud. The capabilities offered by this machine are already powering the AI revolution unfolding in front of us. Thanks to its advancements over previous microarchitectures & comparing the cost and efficiency versus the H100, the NVIDIA HGX H200 is our recommendation for any training or inference project.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products