Korea's artificial intelligence (AI) fabless (chip design) sector has stepped forward with new compute architectures to address data center limits and bottlenecks in the graphics processing unit (GPU)-centric paradigm exposed by the spread of Generative AI. As it becomes harder to meet surging AI demand with existing methods, a strategy to distribute computation across device, inference, and large language model (LLM)-specialized architectures is gaining ground. This shift opens a phase that requires diverse compute approaches beyond GPUs, offering new entry opportunities for domestic Neural Processing Unit (NPU)-based fabless corporations.

As the size and compute load of Generative AI models grow rapidly, expanding servers alone is seen as insufficient to solve power, heat, and expense issues. The industry is moving away from processing AI on a single machine toward distributing computation across multiple types of chips and layers.

Kim Nok-won, CEO of DEEPX, gives a lecture at the Korea Tech Festival held at COEX in Seoul on the 3rd. /Courtesy of Choi Hyo-jung

At the Ministry of Trade, Industry and Energy's "2025 Korea Tech Festival" AI fabless special pavilion on the 3rd, domestic corporations presented a three-track approach: ▲ an on-device strategy that handles portions of computation on devices such as smartphones, robots, and home appliances ▲ building inference infrastructure at national and corporate scales ▲ developing LLM-dedicated architectures. This suggests that the "AI semiconductor division of labor" structure, in which application-specific chips share roles, is taking off, moving away from the first-generation AI compute paradigm reliant on general-purpose GPUs.

Power efficiency (performance per watt) is also emerging as a key competitive metric. In the Generative AI era, competitiveness is determined less by chip speed and more by how much computation can be performed with less power. Domestic corporations are accelerating performance-per-watt gains by adopting Samsung Electronics' 2-nanometer, 4-nanometer, and high bandwidth memory (HBM)-based processes. Collaboration using the latest Samsung foundry processes is significant not only because it helps domestic fabless firms secure product competitiveness, but also because it indicates that the foundry–fabless semiconductor ecosystem has begun to be practically consolidated.

On-device AI corporation DeepX projected that the center of Generative AI will shift from the cloud to devices. Chief Executive Kim Nok-won said, "If we run a 100-billion-parameter model under 5 watts (W), we can cut more than 80% of data center traffic," adding, "on-device AI will become the key infrastructure of the 'physical AI' era." Kim said, "We implemented GPU-class compute that costs $3,000 with a chip under $50," emphasizing the competitiveness of ultra-low-power NPUs. DeepX is developing a Generative AI chip based on Samsung Electronics' 2-nanometer process and said, "We are joining with Tesla as among the first 2-nanometer customers."

Rebellions, an inference-only architecture corporation, pointed to growing demand for sovereign AI at national and corporate scales as a key opportunity. Head of business Kim Kwang-jung said, "Only a few countries build foundation models, but inference infrastructure to operate them is needed by every corporation and government worldwide," adding, "efficient inference infrastructure determines national competitiveness." Kim said, "With our next-generation HBM-based chip, we demonstrated running a 7-billion-parameter model on a single chip," and explained, "through a VLLM-based SDK, we provide a development environment similar to CUDA."

HyperExcel, which aims to resolve LLM compute bottlenecks, put forward its LLM-dedicated architecture (LPU). Chief Executive Kim Ju-young said, "Transformer models suffer severe memory bottlenecks, making it hard to bear inference unit costs with GPUs," adding, "this is the time for a dedicated architecture just for LLMs." Kim said, "By shortening the path from DRAM to logic and adopting a single large-core design, we boosted LLM performance to more than twice that of GPUs," and explained, "we will sample a Samsung 4-nanometer chip co-designed with Naver next year, and we are also developing an on-device LLM chip with LG."

AnalogAI, an analog-based AI Semiconductor corporation, presented an "Analog In-Memory Computing" strategy that fundamentally complements the power limits of digital methods. Chief Executive Lee Jae-joon said, "While AI models are growing exponentially fast, existing methods consume massive power moving data back and forth between memory and processors," adding, "the analog approach, which performs computation within the memory cell itself, can dramatically improve power efficiency."

Lee said, "There is potential for improvements from 100 times up to 10,000 times over existing methods," and added, "we will launch our first chip in 2027 targeting high-application markets such as Augmented Reality (AR) glasses and humanoid robots."

While domestic fabless firms' strategies follow different technological axes—such as ultra-low power, inference infrastructure, LLM specialization, and analog computation—they converge in preparing for the "post-GPU era." A semiconductor industry official said, "As the AI Semiconductor market shifts from a general-purpose GPU-centric paradigm to application-optimized architectures, technical and industrial avenues are opening in earnest for domestic NPU fabless firms to expand their presence."

※ This article has been translated by AI. Share your feedback here.