NVIDIA has announced its next-generation accelerated computing platform with NVIDIA Hopper architecture, delivering an order of magnitude in performance over its predecessor. Named for Grace Hopper, a pioneering US computer scientist who was in the team which discovered the first 'computer bug'*, the new architecture succeeds the NVIDIA Ampere architecture that was launched two years ago.
The company also announced its first Hopper-based GPU, the NVIDIA H100, which has 80 billion transistors. The world's largest and most powerful accelerator, the H100 features such a revolutionary Transformer Engine and a highly scalable NVIDIA NVLink interconnect for complex use cases such as advancing gigantic artificial intelligence (AI) language models, deep recommender systems (systems that can recommend intelligently), genomics and complex digital twins (virtual copies of a physical object typically used for training and predictions).
"Data centres are becoming AI factories - processing and refining mountains of data to produce intelligence," said Jensen Huang, founder and CEO of NVIDIA.
"NVIDIA H100 is the engine of the world's AI infrastructure that enterprises use to accelerate their AI-driven businesses."
![]() |
| Source: NVIDIA. The Hopper architecture. |
The NVIDIA H100 GPU delivers six breakthrough innovations:
World's most advanced chip
Built with 80 billion transistors using a cutting-edge TSMC 4N process designed for NVIDIA's accelerated compute needs, the H100 features major advances to accelerate AI, high performance computing (HPC), memory bandwidth, interconnect and communication, including nearly 5 TBps of external connectivity. The H100 is the first GPU to support PCIe Gen5 and the first to utilise 3rd generation memory standard HBM3, enabling 3TBps of memory bandwidth. Twenty H100 GPUs can sustain the equivalent of the entire world's Internet traffic, making it possible for customers to deliver advanced recommender systems and large language models running inference on data in real time.
New Transformer engine
The Transformer deep learning model is now the standard model choice for natural language processing. The H100 accelerator's Transformer Engine is built to speed up such Transformer networks as much as 6x versus the previous generation, without losing accuracy.
Second-generation secure multi-instance GPU (MIG)
MIG technology allows a single GPU to be partitioned into seven smaller, fully-isolated instances to handle different types of jobs. The Hopper architecture extends MIG capabilities by up to 7x over the previous generation by offering secure multitenant configurations in cloud environments across each GPU instance.
Confidential computing
H100 is the world's first accelerator with confidential computing capabilities. It can protect AI models and customer data while they are being processed. Customers can also apply confidential computing to federated learning for privacy-sensitive industries like healthcare and financial services, as well as on shared cloud infrastructures.
Fourth-generation NVIDIA NVLink
To accelerate the largest AI models, NVLink combines with a new external NVLink switch to extend NVLink as a scale-up network beyond the server, connecting up to 256 H100 GPUs at 9x higher bandwidth versus the previous generation that was using NVIDIA HDR Quantum InfiniBand.
DPX instructions
New DPX instructions accelerate dynamic programming - used in a broad range of algorithms, including route optimisation and genomics - by up to 40x compared with CPUs and up to 7x compared with previous-generation GPUs. This includes the Floyd-Warshall algorithm to find optimal routes for autonomous robot fleets in dynamic warehouse environments, and the Smith-Waterman algorithm used in sequence alignment for DNA, and protein classification and folding.
The combined technology innovations of H100 extend NVIDIA's AI inference and training leadership to enable real-time and immersive applications using giant-scale AI models. The H100 will enable chatbots using the world's most powerful monolithic transformer language model, Megatron 530B, with up to 30x higher throughput than the previous generation, while meeting the subsecond latency required for real-time conversational AI. H100 also allows researchers and developers to train massive models such as Mixture of Experts - with 395 billion parameters - up to 9x faster, reducing the training time from weeks to days.
NVIDIA H100 is designed for every type of data centre, including on-premises, cloud, hybrid-cloud and edge. It is expected to be available worldwide later this year from the world's leading cloud service providers and computer makers, as well as directly from NVIDIA.
NVIDIA's fourth-generation DGX system, DGX H100, features eight H100 GPUs to deliver 32 petaflops of AI performance at new 8-bit floating point (FP8) precision, providing the scale to meet the massive compute requirements of large language models, recommender systems, healthcare research and climate science. While increasing numbers generally point to technology advances, AI performance precision has been decreasing as less storage is required, and energy use is more efficient.
Every GPU in DGX H100 systems is connected by a fourth-generation NVLink, providing 900 GBps connectivity, 1.5x more than the prior generation. NVSwitch enables all eight of the H100 GPUs to connect over NVLink. An external NVLink switch can network up to 32 DGX H100 nodes in the next-generation NVIDIA DGX SuperPOD supercomputers.
The H100 has received broad industry acceptance. Cloud service providers Alibaba Cloud, Amazon Web Services, Baidu AI Cloud, Google Cloud, Microsoft Azure, Oracle Cloud and Tencent Cloud, which plan to offer H100-based instances, while servers with H100 accelerators can be expected from major systems manufacturers including Atos, BOXX Technologies, Cisco, Dell Technologies, Fujitsu, GIGABYTE, H3C, Hewlett Packard Enterprise, Inspur, Lenovo, Nettrix and Supermicro.
H100 will come in SXM and PCIe form factors to support a wide range of server design requirements. A converged accelerator will also be available, pairing an H100 GPU with an NVIDIA ConnectX-7 400Gbps InfiniBand and Ethernet SmartNIC.
NVIDIA's H100 SXM will be available in HGX H100 server boards with four- and eight-way configurations for enterprises with applications scaling to multiple GPUs in a server and across multiple servers. HGX H100-based servers deliver the highest application performance for AI training and inference along with data analytics and HPC applications.
The H100 PCIe, with NVLink to connect two GPUs, provides more than 7x the bandwidth of PCIe 5.0, delivering outstanding performance for applications running on mainstream enterprise servers. Its form factor makes it easy to integrate into existing data centre infrastructure.
The H100 CNX, a new converged accelerator, couples an H100 with a ConnectX-7 SmartNIC to support input-output(I/O)-intensive applications such as multinode AI training in enterprise data centres and 5G signal processing at the edge.
NVIDIA Hopper architecture-based GPUs can also be paired with NVIDIA Grace CPUs with an ultra-fast NVLink-C2C interconnect for over 7x faster communication between the CPU and GPU compared to PCIe 5.0. This combination - the Grace Hopper Superchip - is an integrated module designed to serve giant-scale HPC and AI applications.
The NVIDIA H100 GPU is supported significantly-updated software tools, including the NVIDIA AI suite of software for workloads such as speech, recommender systems and hyperscale inference.
NVIDIA also released more than 60 updates to its CUDA-X collection of libraries, tools and technologies to accelerate work in quantum computing and 6G research, cybersecurity, genomics and drug discovery.
Details:
The NVIDIA H100 will be available from Q322.
Explore:
Watch the GTC 2022 keynote by NVIDIA CEO Jensen Huang, and register for GTC 2022 for free to attend sessions with NVIDIA and industry leaders.
*The first computer bug was literally an insect which had caused a system to stop working.

No comments:
Post a Comment