Pages

Thursday, 14 May 2020

NVIDIA's Ampere architecture rewrites data centre infrastructure requirements

In his online keynote at GTC 2000, NVIDIA founder and Chief Executive Jensen Huang announced a slew of innovations that rewrite the infrastructure wishlist for data centres worldwide:

- The NVIDIA A100, the first GPU based on the NVIDIA Ampere architecture.

- The NVIDIA DGX A100 system, which features eight NVIDIA A100 GPUs interconnected with NVIDIA NVLink.

- Two products for its EGX Edge AI platform — the EGX A100 for larger commercial off-the-shelf servers and the tiny EGX Jetson Xavier NX for micro-edge servers for high-performance, secure AI processing at the edge.

NVIDIA A100

Source: NVIDIA. The NVIDIA A100
Source: NVIDIA. The NVIDIA A100.
The NVIDIA A100 is in full production and shipping to customers worldwide. The A100 draws on design breakthroughs in the NVIDIA Ampere architecture — offering the company’s largest leap in performance to date within its eight generations of GPUs — to unify artificial intelligence (AI) training and inference and boost performance by up to 20x over its predecessors.

A universal workload accelerator, the A100 is also built for data analytics, scientific computing and cloud graphics.

“The powerful trends of cloud computing and AI are driving a tectonic shift in data centre designs so that what was once a sea of CPU-only servers is now GPU-accelerated computing,” said Huang.

“NVIDIA A100 GPU is a 20x AI performance leap and an end to-end machine learning accelerator — from data analytics to training to inference. For the first time, scale-up and scale-out workloads can be accelerated on one platform. NVIDIA A100 will simultaneously boost throughput and drive down the cost of data centres.”

The world’s leading cloud service providers and systems builders that expect to incorporate A100 GPUs into their offerings include: Alibaba Cloud, Amazon Web Services (AWS), Atos, Baidu Cloud,
Cisco, Dell Technologies, Fujitsu, GIGABYTE, Google Cloud, H3C, Hewlett Packard Enterprise
(HPE), Inspur, Lenovo, Microsoft Azure, Oracle, Quanta/QCT, Supermicro, and Tencent Cloud. 

Among the first to tap into the power of NVIDIA A100 GPUs is Microsoft, which will take
advantage of their performance and scalability.

“Microsoft trained Turing Natural Language Generation, the largest language model in the world,
at scale using the current generation of NVIDIA GPUs,” said Mikhail Parakhin, Corporate VP, Microsoft Corp.

DoorDash, an on-demand food platform serving as a lifeline to restaurants during the pandemic,
notes the importance of having a flexible AI infrastructure.  

“Modern and complex AI training and inference workloads that require a large amount of data
can benefit from state-of-the-art technology like NVIDIA A100 GPUs, which help reduce model
training time and speed up the machine learning development process,” said Gary Ren, Machine Learning Engineer at DoorDash, a food delivery service with a presence in Australia.

Other early adopters include national laboratories and some of the world’s leading higher
education and research institutions, each using A100 to power their next-generation
supercomputers. They include:

A100 breakthroughs

The NVIDIA A100 GPU features five key innovations:
 
● NVIDIA Ampere architecture — The NVIDIA Ampere GPU architecture contains more than 54 billion transistors, making it the world’s largest 7-nanometer processor.

● Third-generation Tensor Cores with TF32* — NVIDIA’s popular Tensor Cores are now more flexible, faster and easier to use. Their expanded capabilities include new TF32 for AI, which allows for up to 20x the AI performance of FP32* precision, without any code changes. In addition, Tensor Cores now support FP64*, delivering up to 2.5x more compute than the previous generation for high-performance computing (HPC) applications.

● Multi-instance GPU — Otherwise called MIG, this is a new technical feature that enables a single A100 GPU to be partitioned into as many as seven separate GPUs so it can deliver varying degrees of
compute for jobs of different sizes, providing optimal utilisation and maximising return
on investment.

● Third-generation NVIDIA NVLink — Doubles the high-speed connectivity between GPUs
to provide efficient performance scaling in a server.

● Structural sparsity — This new efficiency technique harnesses the inherently sparse
nature of AI math to double performance.

Together, these new features make the NVIDIA A100 ideal for diverse, demanding workloads,
including AI training and inference as well as scientific simulation, conversational AI,
recommender systems, genomics, high-performance data analytics, seismic modelling, and
financial forecasting. 

NVIDIA A100 in new systems

The NVIDIA DGX A100 system, also announced today, features eight NVIDIA A100 GPUs
interconnected with NVIDIA NVLink. It is available immediately from NVIDIA and approved
partners. 


Additionally, a wide range of A100-based servers are expected from leading systems
manufacturers, including Atos, Cisco, Dell Technologies, Fujitsu, GIGABYTE, H3C, HPE, Inspur,
Lenovo, Quanta/QCT, and Supermicro.

To help accelerate development of servers from its partners, NVIDIA has created HGX A100 — a
server building block in the form of integrated baseboards in a four-GPU and eight-GPU configurations.

Alibaba Cloud, AWS, Baidu Cloud, Google Cloud, Microsoft Azure, Oracle, and Tencent Cloud are
planning to offer A100-based services. 

Software optimisations

NVIDIA also announced several updates to its software stack, enabling application developers to
take advantage of A100 GPU’s innovations. They include new versions of more than 50 CUDA-X
libraries used to accelerate graphics, simulation and AI; CUDA 11; NVIDIA Jarvis, a multimodal,
conversational AI services framework; NVIDIA Merlin, a deep recommender application
framework; and the NVIDIA HPC software development kit (SDK), which includes compilers, libraries and tools that help HPC developers debug and optimise their code for A100. 

Details:

The NVIDIA DGX A100 is available immediately from NVIDIA and approved partners.

*TensorFloat-32 (TF32) is a new math mode in NVIDIA A100 GPUs for handling the matrix math, also called tensor operations, used at the heart of AI and certain HPC applications. TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on NVIDIA's existing Volta GPUs. FP64 refers to 64-bit double precision math, another math mode.

No comments:

Post a Comment