Pages

Monday, 28 May 2018

NVIDIA’s Volta Tensor Core GPU: the world’s fastest AI processor

Source: NVIDIA blog post. The Volta Tensor Core GPU achieves speed records in ResNet-50.
Source: NVIDIA blog post. The Volta Tensor Core GPU achieves speed records in ResNet-50.

NVIDIA designed the Volta Tensor Core architecture to meet the appetite for faster artificial intelligence (AI) computing, and the performance improvements have been significant.

Loyd Case, Editor of NVIDIA's Developer Relations Blog, says in a blog post that the Volta Tensor Core architecture is optimised for the many models of deep learning. "NVIDIA’s Tensor Core GPU architecture built into Volta GPUs represents a huge advancement in the NVIDIA deep learning platform. This new hardware accelerates computation of matrix multiples and convolutions, which account for most of the computational operations when training a neural network," he said.

Today NVIDIA’s Volta Tensor Core GPU is the world’s fastest processor for AI, delivering 125 teraflops of deep learning performance with a single chip:

- A single V100 Tensor Core GPU achieves 1,075 images/second  (ips) when training ResNet-50, a 4x performance increase compared to the previous generation Pascal GPU.

- A single DGX-1 server powered by eight Tensor Core V100s achieves 7,850 ips, almost 2x the 4,200 ips from a year ago on the same system.

- A single AWS P3 cloud instance powered by eight Tensor Core V100s can train ResNet-50 in less than three hours today, 3x faster than a Google tensor processing unit (TPU), the Google equivalent to a Tensor Core CPU.

"NVIDIA Tensor Core GPU architecture allows us to simultaneously provide greater performance than single-function ASICs, yet be programmable for diverse workloads. For instance, each Tesla V100 Tensor Core GPU delivers 125 teraflops of performance for deep learning compared to 45 teraflops by a Google TPU chip. Four TPU chips in a ‘cloud TPU’ deliver 180 teraflops of performance; by comparison, four V100 chips deliver 500 teraflops of performance," Case said.

NVIDIA holds the speed record for the fastest single cloud instance, Case shared. "Jeremy Howard and researchers at fast.ai incorporated key algorithmic innovations and tuning techniques to train ResNet-50 on ImageNet in just three hours on a single AWS P3 instance, powered by eight V100 Tensor Core GPUs. ResNet-50 ran three times faster than a TPU based cloud instance which takes close to nine hours to train ResNet-50," said Case.

"We further expect the methods described in this blog to improve throughput will be applicable to other approaches such as fast.ai’s and will help them converge even faster."

"We will continue to optimise through the entire stack and continue to deliver exponential performance gains to equip the AI community with the tools for driving deep learning innovation forward," he added.

"We’ll soon be combining 16 Tesla V100s into a single server node to create the world’s fastest computing server, offering 2 petaflops of performance."

No comments:

Post a Comment