Pages

Thursday, 30 July 2020

NVIDIA breaks 16 AI performance records in new MLPerf Benchmarks

NVIDIA delivers the world’s fastest artificial intelligence (AI)-training performance among commercially-available products, according to just-released MLPerf Benchmarks.

NVIDIA’s new DGX SuperPOD, built in less than a month and featuring more than 2,000 NVIDIA A100 GPUs, swept every MLPerf Benchmark category for at-scale performance among commercially-available products. The A100 Tensor Core GPU demonstrated the fastest performance per accelerator on all eight MLPerf Benchmarks.

For overall fastest time-to-solution at scale, the DGX SuperPOD system, a cluster of DGX A100 systems connected with HDR InfiniBand*, also set eight new performance milestones. This is the third consecutive - and strongest showing - for NVIDIA in training tests from MLPerf, an industry benchmarking group formed in 2018.

NVIDIA was the only company to field commercially-available products for all the tests. Most other submissions used the preview category for products, which means that may not be available for several months, or the research category for products, products which are not expected to be available for some time.

Paresh Kharya, Senior Director of product management, Data Center Computing, NVIDIA, called MLPerf the "gold standard for AI benchmarking", and said that the results speak to the readiness of the NVIDIA platform, from its AI stack, to software, hardware and the wider ecosystem, to capture AI opportunities.

"We are very well positioned," he said when asked about NVIDIA's ability to capture opportunities in AI. "The latest benchmarks prove NVIDIA continues to lead in AI performance."

"What really matters to customers, at the end of the day for AI training, is how fast they can create AI models," he elaborated.

Source: NVIDIA. NVIDIA’s new DGX SuperPOD has swept every MLPerf Benchmark category for at-scale performance among commercially-available products.
Source: NVIDIA. NVIDIA’s new DGX SuperPOD has swept every MLPerf Benchmark category for at-scale performance among commercially-available products.
The A100 is the first GPU based on the NVIDIA Ampere architecture. “Users across the globe are applying the A100 to tackle the most complex challenges in AI, data science and scientific computing... All are enjoying the greatest generational performance leap in eight generations of NVIDIA GPUs,” Kharya said.

The results demonstrate a significant performance gain in just 1.5 years, Kharya added. “The latest results demonstrate NVIDIA’s focus on continuously evolving an AI platform that spans processors, networking, software and systems. For example, the tests show (that) at equivalent throughput rates today’s DGX A100 system delivers up to 4x the performance of the system that used V100 GPUs in the first round of MLPerf training tests,” said Kharya.

Meanwhile, the original DGX-1 system based on NVIDIA V100 can now deliver up to 2x higher performance thanks to the latest software optimisations, he said. These gains came in under two years. As added context, NVIDIA set six records in the first MLPerf Training Benchmarks in December 2018 and eight in July 2019.

Companies are already reaping the benefits of these performance highs. Alibaba hit a US$38 billion sales record on Singles Day in November 2019 using NVIDIA GPUs instead of CPUs to deliver more than 100x more queries per second on its recommendation systems, while DGX SuperPODs are driving business results for companies like Lockheed Martin in aerospace and Microsoft in cloud-computing services.

Of the nine companies submitting results, seven submitted with NVIDIA GPUs including cloud service providers (Alibaba Cloud, Google Cloud, Tencent Cloud) and server makers (Dell, Fujitsu, and Inspur), highlighting the strength of NVIDIA’s ecosystem. The MLPerf partners represent part of an ecosystem of nearly two dozen cloud-service providers and original equipment manufacturers (OEMs) with products or plans for online instances, servers and PCIe cards using NVIDIA A100 GPUs.

Much of the software that NVIDIA and its partners used for the latest MLPerf Benchmarks is available today on NGC, NVIDIA’s software hub. NGC hosts GPU-optimised containers, software scripts, pretrained models and software development kits (SDKs). Kharya emphasised that the NGC software is free, constantly updated, and certified to run well with specific hardware.

*HDR InfiniBand enables extremely low latencies and high data throughput, while offering smart deep learning computing acceleration engines via Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) technology.

No comments:

Post a Comment