Gartner Generative AI inference expense scenario outlook./Courtesy of Gartner

Gartner said on the 30th that even if the inference expense of large language models drops significantly by 2030, corporations' artificial intelligence (AI) expense burden will persist.

Gartner projected that the inference expense of a 1-trillion-parameter (1000B) large language model (LLM) will fall by more than 90% compared with 2025. As a result, it said expense efficiency for models of the same size could improve by up to 100 times.

An AI token is the basic unit of data processed by a Generative AI model and is defined as data equivalent to about 3.5 bytes (about four characters).

Will Sommer, a Gartner senior director analyst, explained, "This expense reduction is enabled by improvements in semiconductor and infrastructure efficiency, innovations in model design, higher chip utilization, the expansion of inference-specialized semiconductors, and broader application to edge devices."

In this outlook, Gartner analyzed the expense structure through two semiconductor-based scenarios: ▲ Frontier scenario ▲ Legacy blend scenario. The legacy blend scenario showed higher expense than the frontier scenario due to performance limits.

However, Gartner assessed that a drop in token unit prices will not directly translate into AI expense cuts for corporations.

This is because the spread of AI agents is restructuring the overall expense structure as token usage per task increases by 5 to 30 times compared with before. As a result, even if token unit prices fall, the total inference expense that corporations actually bear is likely to increase.

Gartner analyst Will Sommer said, "Do not mistake falling prices for general-purpose tokens as the democratization of advanced reasoning capabilities," adding, "Basic AI functions are approaching virtually zero expense, but computing resources and systems for advanced reasoning remain limited." He added, "Corporations that mask architectural inefficiencies with cheap token expense may face limits in the next stage of agent-based AI scaling."

Gartner also emphasized that future AI competitiveness depends not on a single model but on a "multi-model orchestration" strategy.

It said a structure is needed in which repetitive and simple tasks are handled by small models or domain-specialized models, while high-expense, high-performance models are selectively used only for complex, high value-added work.

Through this, corporations are expected to achieve both expense efficiency and performance.

※ This article has been translated by AI. Share your feedback here.