Google has unveiled its eighth-generation tensor processing unit (TPU), specialized for artificial intelligence (AI) training and inference. Long viewed in the industry as a challenger to Nvidia's graphics processing units (GPUs), Google's TPUs deliver triple the training speed and ultra-low-latency inference with this eighth-generation product.
On the 22nd (local time), at the Google Cloud Next event held at the Mandalay Bay Convention Center in Las Vegas, Google introduced the TPU 8t and TPU 8i, optimized for training and inference.
A TPU is an application-specific integrated circuit (ASIC) tailored for Google's proprietary AI services, which it continues to develop. By optimizing the power delivery architecture, it is regarded as more power efficient than Nvidia GPUs. First deployed in early 2015 at Google Cloud data centers, it has played a key role in reducing reliance on Nvidia's GPUs.
First, the TPU 8t triples training performance over the seventh-generation Ironwood by leveraging high compute throughput and shared high-bandwidth memory (HBM). It also expands up to 9,600 chips and up to 2 PB (petabytes) of HBM capacity using inter-chip interconnect (ICI) technology. According to Google, this TPU can cut the time required to develop state-of-the-art AI models from months to weeks.
Optimized for inference, the TPU 8i pairs 288 GB of HBM with 384 MB of fast SRAM and more than halves inter-chip data movement paths. That means it supports faster response times for AI services. It can also prevent bottlenecks when running common AI chatbot responses as well as robots or agents, the company said. Power efficiency has been significantly improved, boosting performance per dollar by 80% over the previous generation.
Thomas Kurian, CEO of Google Cloud, explained why the AI chips were split into two types, saying, "As generative AI spreads widely, we judged that people would want systems optimized for training and systems tailored for inference, respectively," and added, "Anticipating that power would become a constraint in scaling AI infrastructure, we focused on maximizing energy efficiency from the design stage."