Pages

Monday, 3 August 2020

Google claims world's fastest training supercomputer title

The latest results from the MLPerf Benchmark competition demonstrate that Google has built the world’s fastest machine-learning (ML) training supercomputer, said the company in a blog post. Google set performance records in six out of eight MLPerf benchmarks, said Naveen Kumar a Test Engineer at Google.

Source: Google blog post. Google sets six large-scale training performance records in MLPerf v0.7. Speed comparisons for Google’s best MLPerf Training v0.7 research submission over the fastest non-Google submission in any availability category. Comparisons are normalised by overall training time regardless of system size, which ranges from eight to 4,096 chips. Taller bars are better*. 

DLRM represents ranking and recommendation models that are core to online businesses from media to travel to e-commerce. BERT enabled Google Search’s “biggest leap forward in the past five years”. Transformer is the foundation of a wave of recent advances in natural language processing, including BERT. ResNet-50 is a widely-used model for image classification. SSD is an object detection model that’s lightweight enough to run on mobile devices. Mask R-CNN is a widely-used image segmentation model that can be used in autonomous navigation, medical imaging, and other domains.

"We achieved these results with ML model implementations in TensorFlow, JAX, and Lingvo. Four of the eight models were trained from scratch in under 30 seconds. To put that in perspective, consider that in 2015, it took more than three weeks to train one of these models on the most advanced hardware accelerator available. Google’s latest TPU supercomputer can train the same model almost five orders of magnitude faster just five years later," noted Kumar.

TensorFlow is Google’s end-to-end open-source machine learning framework, Lingvo is a high-level framework for sequence models built using TensorFlow, and JAX is a new research-focused framework based on composable function transformations, he explained. TPU stands for Tensor processing unit.

The supercomputer Google used for this MLPerf Training round is four times larger than the Cloud TPU v3 Pod that set three records in the previous competition, Kumar noted. The system includes 4,096 TPU v3 chips and hundreds of CPU host machines, all connected via an ultra-fast, ultra-large-scale custom interconnect. In total, this system delivers over 430 PFLOPs** of peak performance.

"Google’s MLPerf Training v0.7 submissions demonstrate our commitment to advancing machine learning research and engineering at scale and delivering those advances to users through open-source software, Google’s products, and Google Cloud," Kumar said.

Google’s second-generation and third-generation TPU supercomputers can be leveraged via Google Cloud today. The cloud TPUs support TensorFlow and PyTorch.

*FLOPS stands for floating point operations per second. One quadrillion (1015) FLOPS equal 1 petaFLOPS. 

**All results retrieved from www.mlperf.org on July 29, 2020. Chart compares results: 0.7-70 v. 0.7-17, 0.7-66 v. 0.7-31, 0.7-68 v. 0.7-39, 0.7-68 v. 0.7-34, 0.7-66 v. 0.7-38, 0.7-67 v. 0.7-29.

No comments:

Post a Comment