![]() |
Source: IBM. Preview of IBM Spyre Accelerator (left) next to IBM Telum II Processor (right). |
IBM has revealed architecture details for the upcoming IBM Telum II Processor and the IBM Spyre Accelerator. The new technologies are designed to significantly scale processing capacity across next-generation IBM Z mainframe systems, helping accelerate the use of traditional AI models and as well as large-language AI models in tandem through a new ensemble method of AI.
With many generative AI projects leveraging large language models (LLMs) moving from proof-of-concept to production, the demands for power-efficient, secured and scalable solutions have emerged as key priorities. Morgan Stanley research published in August projects generative AI's power demands will skyrocket 75% annually over the next several years, putting it on track to consume as much energy in 2026 as Spain did in 2022.*
Many IBM clients have indicated architectural decisions to support appropriately-sized foundation models and hybrid-by-design approaches for AI workloads are increasingly important.
The key innovations unveiled include:
IBM Telum II Processor
The Telum II processor will be the central processor powering IBM's next-generation IBM Z and IBM LinuxONE platforms. The new processor is expected to support enterprise compute solutions for LLMs.
The new IBM chip features increased frequency, memory capacity, a 40% growth in cache and integrated AI accelerator core as well as a coherently-attached data processing unit (DPU) versus the first-generation Telum chip.
Featuring eight high-performance cores running at 5.5 GHz, with 36 MB L2 cache per core and a 40% increase in on-chip cache capacity for a total of 360 MB. The virtual level-4 cache of 2.88 GB per processor drawer provides a 40% increase over the previous generation.
The integrated AI accelerator allows for low-latency, high-throughput in-transaction AI inferencing, for example enhancing fraud detection during financial transactions, and provides a fourfold increase in compute capacity per chip over the previous generation.IO acceleration unit
A completely new data processing unit (DPU) on the Telum II processor chip is engineered to accelerate complex input-output (I/O) protocols for networking and storage on the mainframe. The DPU simplifies system operations and can improve key component performance.
The new I/O Acceleration Unit DPU is integrated into the Telum II chip. It is designed to improve data handling with a 50% increased I/O density. This advancement enhances the overall efficiency and scalability of IBM Z, making it well suited to handle the large-scale AI workloads and data-intensive applications of today.
IBM Spyre Accelerator
Provides additional AI compute capability to complement the Telum II processor. Working together, the Telum II and Spyre chips form a scalable architecture to support ensemble methods of AI modelling – the practice of combining multiple machine learning or deep learning AI models with encoder LLMs. By leveraging the strengths of each model architecture, ensemble AI may provide more accurate and robust results compared to individual models.
The purpose-built enterprise-grade accelerator features up to 1 TB of memory, built to work in tandem across the eight cards of a regular I/O drawer, to support AI model workloads across the mainframe while designed to consume no more than 75 W per card. Each chip will have 32 compute cores supporting int4, int8, fp8, and fp16 datatypes for both low-latency and high-throughput AI applications.
The IBM Spyre Accelerator chip will be delivered as an add-on option. Each accelerator chip is attached via a 75-watt PCIe adapter and is based on technology developed in collaboration with the IBM Research. As with other PCIe cards, the Spyre Accelerator is scalable to fit client needs.
"Our robust, multi-generation roadmap positions us to remain ahead of the curve on technology trends, including escalating demands of AI," said Tina Tarquinio, VP, Product Management, IBM Z and LinuxONE.
"The Telum II Processor and Spyre Accelerator are designed to deliver high-performance, secured, and more power efficient enterprise computing solutions. After years in development, these innovations will be introduced in our next generation IBM Z platform so clients can leverage LLMs and generative AI at scale."
The Telum II processor and the IBM Spyre Accelerator will be manufactured by IBM's long-standing fabrication partner, Samsung Foundry, and built on its high-performance, power-efficient 5 nm process node. Working in concert, they will support a range of advanced AI-driven use cases designed to unlock business value and create new competitive advantages.
With ensemble methods of AI, clients can achieve faster, more accurate results on their predictions. The combined processing power announced will provide an on-ramp for the application of generative AI use cases, such as:
Insurance claims fraud detection
Enhanced fraud detection in home insurance claims through ensemble AI, which combine LLMs with traditional neural networks geared for improved performance and accuracy.
Advanced anti-money laundering
Advanced detection for suspicious financial activities, supporting compliance with regulatory requirements and mitigating the risk of financial crimes.
AI assistants
Driving the acceleration of application lifecycles, transfer of knowledge and expertise, code explanations as well as transformation, and more.
Details
The Telum II processor is expected to be available to IBM Z and LinuxONE clients in 2025. The IBM Spyre Accelerator, currently in tech preview, is also expected to be available in 2025.
*Source: Morgan Stanley Research, August 2024.
No comments:
Post a Comment