TechTrade Asia: NVIDIA announces new AI training and inference infrastructure

Source: NVIDIA blog. Project MagLev is announced at the Facebook @Scale Conference in the US.

NVIDIA has announced Project MagLev at the Facebook @Scale Conference in the US. Project MagLev is an internally-developed artificial intelligence (AI) training and inference infrastructure to ease bottlenecks in the end-to-end industry-grade AI development workflow for autonomous driving.

"We built our AI for autonomous driving, with deep neural networks running simultaneously to handle the full range of real-world conditions — vehicles and pedestrians, lighting at different times of day, sleet, glare, black ice, you name it," Clement Farabet, VP, AI Infrastructure, said in a blog post.

"Implementing the right algorithms to power this AI requires a tremendous amount of research and development. One of the most daunting challenges is verifying the correctness of deep neural networks across all conditions. And it’s not enough to do it once: you have to do so again and again — or 'at scale' — to meet rigorous safety requirements."

Farabet said MagLev has to match human performance without error and then do even better; handle a 12-camera platform equipped with lidar and radar sensors, and millions of miles' worth of data. MagLev will use automation to improve the training and validation of industry-grade AI systems — including petabyte-scale testing, high-throughput data management and labelling, AI-based data selection to build the right datasets, traceability for safety, and end-to-end workflow automation, he said.

"We built every component in MagLev with scale and flexibility in mind. This includes, for example, infrastructure to run hyper-parameter tuning, which is essential to explore more model architectures and training techniques, and find the best possible one. MagLev suggests experiments to run based on several exploration strategies, and can leverage results from past experiments," Farabet said.

MagLev can also help in building the right training datasets through a technique called active learning, for which it must perform inference at a massive scale.

Farabet noted that inferencing for automation is enhanced by innovations within the NVIDIA TensorRT Hyperscale Inference Platform, which was announced at GTC Japan. The new NVIDIA TensorRT inference server provides a containerised, production-ready AI inference server for data centre deployments.

NVIDIA is also working with Kubeflow to make it easy to deploy GPU-accelerated inference across Kubernetes clusters, Farabet said. The combination of NVIDIA TensorRT inference server and Kubeflow will makes data centre production using AI inference repeatable and scalable, he said.

TechTrade Asia

Pages

Wednesday, 19 September 2018

NVIDIA announces new AI training and inference infrastructure

No comments:

Post a Comment