Nvidia debuts AI inference chips and Rosa CPU to power agent era

Huang Jensen, Nvidia CEO. /Courtesy of Nvidia

Nvidia unveiled new chips tailored for artificial intelligence (AI) inference. It also introduced a new central processing unit (CPU). The plan is to provide infrastructure suited for the age of AI agents.

Chief Executive Officer Jensen Huang said in a keynote at Nvidia's annual developer conference (GTC 2026) on the 16th at the SAP Center in San Jose, California, that the Groq3 language processing unit (LPU) will be integrated into the next-generation AI chip Vera Rubin.

With this, the plan is to split the roles of the Rubin graphics processing unit (GPU) and the LPU to increase efficiency—large-scale computation handling vast data goes to the GPU, while the ultra-fast LPU processes AI responses. Vera Rubin combines 36 Vera CPUs and 72 Rubin GPUs into a single system, boosting inference performance fivefold over Blackwell while cutting the expense per token to one-tenth, and it launches this year.

Huang also introduced Feynman, the next-generation GPU after Rubin. Feynman runs with a new CPU called Rosa. It is slated to feature the LP40 LPU. Huang said, "By next year, Nvidia's AI chip sales opportunity will reach at least $1 trillion (about 1,500 trillion won)."

Huang said the combination of the LPU and Vera Rubin can improve inference throughput for AI models with parameters in the trillions by 35 times, and enhance low-latency inference capability. Nvidia specifically said an LPX rack, which integrates 256 LPUs into a single unit, will be integrated into Vera Rubin. As a result, the components of the Vera Rubin supercomputer increased from six at the time of the CES 2026 announcement in January to seven, including the LPU.

Huang also unveiled the new Vera CPU and a CPU rack populated with 256 of them. The company said performance is 1.5 times higher and energy efficiency is twice that of existing x86 CPUs. The Vera CPU is equipped with the Olympus core, which Nvidia designed in-house to run AI, delivering three times the memory bandwidth of x86 CPUs.

Nvidia appears set to target the AI agent market with these products. Unlike general AI chatbots, AI agents require faster speed and greater control. GPUs accelerate data processing, while the LPU shares the role of issuing specific task instructions.

※ This article has been translated by AI. Share your feedback here.