This FPGA-based accelerated deep learning platform is capable of delivering “real-time artificial intelligence (AI),” which will allow cloud infrastructure to process and transmit data as fast as it comes in, with ultra-low latency. Microsoft is currently working to deploy Project Brainwave in the Azure cloud so that customers eventually can run complex deep learning models at record-setting performance.
In the cloud, delivering real-time AI is becoming more important as systems are required to process live data streams, including video, sensors or search queries, and rapidly deliver the data back to users, Intel explains.
"For Microsoft’s engineers and data scientists, the biggest challenge here is not how fast they can train the AI model, but how they can apply AI algorithms to massive data streams in real time across a range of data types. This is a much more difficult, and computationally advanced, application than batch-processed AI or other more latency-tolerant AI applications. Real-time AI requires a special mix of software-like flexibility and hardware-like acceleration technologies in the supporting IT systems," says Dan McNamara, Corporate VP and GM of the Programmable Solutions Group (PSG) at Intel, in a blog post.
![]() |
Source: Intel. Stratix 10 FPGAs and SoC FPGAs leverage Intel’s 14nm process. |
"Many silicon AI accelerators today require grouping multiple requests together (called batching) to achieve high performance. Project Brainwave, leveraging the Intel Stratix 10 technology, demonstrated over 39 Teraflops of achieved performance on a single request, setting a new standard in the cloud for real-time AI computation. Stratix 10 FPGAs sets a new level of cloud performance for real-time AI computation, with record low latency, record performance and batch-free execution of AI requests," McNamara said.
“We exploit the flexibility of Intel FPGAs to incorporate new innovations rapidly, while offering performance comparable to, or greater than, many ASIC-based deep learning processing units,” said Doug Burger, Distinguished Engineer at Microsoft Research NExT, author of the Microsoft Research blog post.
Burger said that Project Brainwave system is built with three main layers:
- A high-performance, distributed system architecture;
- A hardware DNN engine synthesised onto FPGAs; and
- A compiler and runtime for low-friction deployment of trained models.
By attaching high-performance FPGAs directly to the Microsoft data centre network, DNNs can be mapped to a pool of remote FPGAs and called by a server with no software in the loop. "This system architecture both reduces latency, since the CPU does not need to process incoming requests, and allows very high throughput, with the FPGA processing requests as fast as the network can stream them," Burger said.
Second, Project Brainwave uses a DNN processing unit (DPU) that can be customised closer to the time that it is needed. Unlike hardcoded DPU chips, which have to be finalised at design time, Project Brainwave can define narrow-precision data types for FPGAs that increase performance without real losses in model accuracy as well as incorporate research innovations into the hardware within a few weeks.
Third, Project Brainwave incorporates a software stack designed to support popular deep learning frameworks. "We already support Microsoft Cognitive Toolkit and Google’s Tensorflow, and plan to support many others. We have defined a graph-based intermediate representation, to which we convert models trained in the popular frameworks, and then compile down to our high-performance infrastructure," Burger said.
AI requires multiple technologies to efficiently manage various workload requirements. Intel offers a broad set of technologies to enable the market’s evolution, including Intel Xeon processors, Intel FPGAs and Intel Nervana ASIC technology. Compared to dedicated deep learning hardware accelerators that are optimised to run a single workload, the flexibility of Intel FPGAs enables users to customise the hardware to meet specific workload requirements, and reconfigure the hardware rapidly as deep learning workloads and use models change.
In conjunction with the latest Intel Xeon Scalable processors, Intel FPGAs are customisable and programmable to deliver low latency and flexible precision, with higher performance per watt for deep learning inference compared with Intel Xeon processors alone. Intel Stratix 10 FPGAs combine hardened processor blocks that deliver high levels of sustained performance and efficiency, with a programmable fabric for user customisation.
No comments:
Post a Comment