Google reshapes AI chips as TPU-CPU-cloud stack challenges Nvidia

Google Cloud holds its annual tech conference Google Cloud Next 2026 in Las Vegas, United States, on the 22nd (local time) and unveils its 8th‑generation TPU. From left, the TPU 8t chip for AI training and the TPU 8i chip for inference. /Courtesy of Google Cloud

On the 22nd (local time), Google unveiled its newly developed 8th-generation tensor processing unit (TPU), shaking the balance of the artificial intelligence (AI) chip market. Observers say the paradigm is shifting from a structure centered on Nvidia's graphics processing units (GPUs), which had dominated the AI Semiconductor market, to a "multi-layer architecture" that combines application-specific chips (ASICs), CPUs, and cloud.

◇New-concept AI chips that remove GPU inefficiency

Recently, as it introduced the 8th-generation TPU, Google formalized a strategy to keep separating training and inference chips and steadily expand the share of its in-house AI Semiconductor. In particular, the inference-only TPU focuses on maximizing latency reduction and expense efficiency, which are crucial for large-scale AI services.

As AI infrastructure shifts from "model training" to "real-time inference," there is a growing recognition that general-purpose GPUs alone cannot respond efficiently. In fact, TPUs deliver up to four times better price-performance than GPUs for certain workloads and are known to be far more power-efficient.

This change goes beyond a simple performance race and signals a transformation in the philosophy of AI infrastructure design itself. Nvidia GPUs are "general-purpose accelerators" that started with graphics processing and expanded to AI. By contrast, TPUs are ASICs optimized for Deep Learning from the start. This difference is increasingly emerging as a key factor in the investment-efficiency view of the AI data market.

As AI models scale, data movement, power consumption, and expense structure become core competitive factors, making chips optimized for specific tasks advantageous. Analyses show TPUs can cut expenses by up to 80% compared to GPUs, and some corporations are switching their GPU clusters to TPUs to sharply reduce inference expenses.

◇Breaking the "memory bottleneck" with expanded SRAM... a strategy resembling Nvidia's

The highlight of the 8th-generation TPU design is tripling the capacity of on-chip SRAM, an ultra-fast storage area inside the chip, to tackle data bottlenecks. SRAM, the "ultra-fast drawer" that exchanges data right next to the compute units, slashes travel time to external memory (HBM). In line with the AI trend in which the efficiency of data pathways matters more than raw compute, Google chose a direct approach of significantly expanding SRAM despite its high unit cost.

SRAM is a key asset (IP) placed internally at the design stage rather than a separate component. Google drafts the blueprint with Broadcom and fabricates the chips on TSMC's latest process. Tripling the bulky SRAM means it pushed foundry advanced process capabilities to the limit to maximize data efficiency. That aligns with Nvidia's strategy of increasing cache memory in Blackwell and suggests that the decisive battleground in AI Semiconductor has shifted to "memory optimization."

Another notable point is the design that ties 1,152 chips with an optical circuit switch (OCS) to expand memory capacity sevenfold. Solving bottlenecks for large models at the system level is the core competitiveness of this TPU. This throws both opportunities and challenges to Korea's memory industry. While HBM adoption rises, the growing share of on-chip SRAM means Samsung Electronics and SK hynix will likely need more advanced responses such as "custom HBM" that are more tightly integrated with chip design.

◇Intel finds new openings as CPU use rises

Visitors to Mobile World Congress 2025 at Fira Gran Via in Barcelona, Spain, experience AI technologies at Intel's booth. /Courtesy of News1

CPU demand within AI infrastructure is also likely to grow. In this TPU system, Google integrated its in-house designed CPU, redefining the CPU not as a mere auxiliary compute device but as the "conductor of the AI orchestra." In environments where numerous AI agents run simultaneously, a general-purpose processor is essential to handle job scheduling, data-flow management, and system control.

In its first-quarter earnings conference call this year, Intel emphasized that the usage ratio of GPUs to CPUs in AI data centers has shifted from about 1:8 in the past to around 1:4 recently. It even mentioned the possibility of a 1:1 ratio or CPUs being installed more than GPUs going forward. Intel CEO Pat Gelsinger said, "As AI processing moves toward inference, CPUs are more efficient for task coordination and control, and for managing diverse agents and data."

Meanwhile, Google's strategy appears likely to accelerate the move away from Nvidia. Nvidia still holds an 80%–90% share of the AI accelerator market. But some forecast that, if TPU adoption gains full steam and other big tech companies likewise maximize the use of customized AI Semiconductor and CPUs optimized for their own services, Nvidia's revenue could be eroded by as much as 10%.

※ This article has been translated by AI. Share your feedback here.