NVIDIA has launched an AI data centre platform that delivers advanced inference
acceleration for voice, video, image and recommendation services.
The NVIDIA TensorRT Hyperscale
Inference Platform features NVIDIA Tesla T4 GPUs based on NVIDIA
Turing architecture and new inference
software. The platform
enables hyperscale data centres to offer new services, such as enhanced natural language
interactions and direct answers to search queries rather than a list of possible results.
“Our customers are racing toward a future where every product and service will be touched and
improved by AI,” said Ian Buck, VP and GM of Accelerated Business at
NVIDIA.
“The NVIDIA TensorRT Hyperscale Platform has been built to bring this to reality —
faster and more efficiently than had been previously thought possible.”
Processing voice queries, translations, images, videos,
recommendations and social media interactions are handled by data centres but each application requires a different
type of neural network residing on the server for processing to occur, NVIDIA said. To optimise the data centre for maximum throughput and server utilisation, the NVIDIA
TensorRT Hyperscale Platform includes both real-time inference software and Tesla T4 GPUs,
which process queries up to 40x faster than CPUs alone.
Key platform elements include:
• NVIDIA Tesla T4 GPU – Featuring 320 Turing Tensor cores and 2,560 CUDA cores, this
new GPU provides breakthrough performance with flexible, multi-precision capabilities. Packaged in an energy-efficient, 75-watt, small PCIe form factor that easily fits into most servers, it offers 65 teraflops of peak performance for FP16, 130 teraflops for INT8 and 260 teraflops for INT4. FP16 refers to 16-bit floating point data while INT4 and INT8 stand for four- and eight-bit integer data respectively.
• NVIDIA TensorRT 5 – An inference optimiser and runtime engine, NVIDIA TensorRT 5
supports Turing Tensor Cores and expands the set of neural network optimisations for
multi-precision workloads.
• NVIDIA TensorRT inference server – This containerised microservice software enables
applications to use AI models in data centre production. Freely available from the
NVIDIA GPU Cloud container registry, it maximises data centre throughput and GPU
utilisation, supports all popular AI models and frameworks, and integrates with
Kubernetes and Docker.
“We are working hard at Microsoft to deliver the most innovative AI-powered services to our
customers,” said Jordi Ribas, Corporate VP for Bing and AI Products at Microsoft.
“Using NVIDIA GPUs in real-time inference workloads has improved Bing’s advanced search
offerings, enabling us to reduce object detection latency for images. We look forward to
working with NVIDIA’s next-generation inference hardware and software to expand the way
people benefit from AI products and services.”
Chris Kleban, Product Manager at Google Cloud, said: “AI is becoming increasingly pervasive,
and inference is a critical capability customers need to successfully deploy their AI models, so
we’re excited to support NVIDIA’s Turing Tesla T4 GPUs on Google Cloud Platform soon.”
“Cisco’s UCS portfolio delivers policy-driven, GPU-accelerated systems and solutions to power
every phase of the AI lifecycle. With the NVIDIA Tesla T4 GPU based on the NVIDIA Turing
architecture, Cisco customers will have access to the most efficient accelerator for AI inference
workloads — gaining insights faster and accelerating time to action.” added Kaustubh Das, VP of product management, Data Center Group, Cisco. Cisco Unified Computing System (UCS) servers unite computing, networking, storage access, and virtualisation.
“Dell EMC is focused on helping customers transform their IT while benefiting from
advancements such as artificial intelligence. As the world’s leading provider of server systems,
Dell EMC continues to enhance the PowerEdge server portfolio to help our customers ultimately
achieve their goals. Our close collaboration with NVIDIA and historical adoption of the latest
GPU accelerators available from their Tesla portfolio play a vital role in helping our customers
stay ahead of the curve in AI training and inference,” said Ravi Pendekanti, Senior VP of product management and marketing,
Servers & Infrastructure Systems, Dell EMC.
“Fujitsu plans to incorporate NVIDIA’s Tesla T4 GPUs into our global Fujitsu Server PRIMERGY
systems lineup. Leveraging this latest, high-efficiency GPU accelerator from NVIDIA, we will
provide our customers around the world with servers highly optimised for their growing AI
needs,”
stated Hideaki Maeda, VP of the Products Division, Data Center Platform
Business Unit, Fujitsu.
Ltd.
“At HPE, we are committed to driving intelligence at the edge for faster insight and improved
experiences. With the NVIDIA Tesla T4 GPU, based on the NVIDIA Turing architecture, we are
continuing to modernise and accelerate the data center to enable inference at the edge,”
said Bill Mannel, VP and GM, HPC and AI Group, Hewlett Packard
Enterprise (HPE).
“IBM Cognitive Systems is able to deliver 4x faster deep learning training times as a result of a
co-optimised hardware and software on a simplified AI platform with PowerAI, our deep
learning training and inference software, and IBM Power Systems AC922 accelerated servers.
We have a history of partnership and innovation with NVIDIA, and together we co-developed
the industry’s only CPU-to-GPU NVIDIA NVLink connection on IBM Power processors, and we are
excited to explore the new NVIDIA T4 GPU accelerator to extend this state of the art leadership
for inference workloads,”
said Steve Sibley, VP of Power Systems Offering Management, IBM.
“We are excited to see NVIDIA bring GPU inference to Kubernetes with the NVIDIA TensorRT
inference server, and look forward to integrating it with Kubeflow to provide users with a
simple, portable and scalable way to deploy AI inference across diverse infrastructures,” noted David Aronchick, co-founder and Product Manager of Kubeflow.
“Open source cross-framework inference is vital to production deployments of machine learning
models. We are excited to see how the NVIDIA TensorRT inference server, which brings a
powerful solution for both GPU and CPU inference serving at scale, enables faster deployment
of AI applications and improves infrastructure utilisation,”
commented Kash Iftikhar, VP of product development, Oracle Cloud Infrastructure.
“Supermicro is innovating to address the rapidly emerging high-throughput inference market
driven by technologies such as 5G, smart cities and Internet of Things (IoT) devices, which are generating huge
amounts of data and require real-time decision making. We see the combination of NVIDIA
TensorRT and the new Turing architecture-based T4 GPU accelerator as the ideal combination
for these new, demanding and latency-sensitive workloads and plan to aggressively leverage
them in our GPU system product line,” said Charles Liang, President and CEO, Supermicro, which has a presence in Taiwan.
NVIDIA estimates that the AI inference industry is poised to grow in the next five years into a US$20 billion market.
Details:
Request early access to T4 GPUS
No comments:
Post a Comment