The research team led by Professor Park Jong-se of the Korea Advanced Institute of Science and Technology (KAIST) Department of Computer Science summarizes the study conducted by researchers at the Georgia Institute of Technology in the U.S. and Uppsala University in Sweden. /Courtesy of KAIST

As artificial intelligence (AI) models grow larger and more complex, the need for semiconductor technology that computes faster and uses less electricity is increasing. In this context, Korea Advanced Institute of Science and Technology (KAIST) and overseas researchers have developed a technology that newly changes the structure of the core semiconductor that serves as the "brain" of AI.

A research team led by Park Jong-se KAIST School of Computing professor said on the 17th that, together with researchers at the Georgia Institute of Technology in the United States and Uppsala University in Sweden, it developed a semiconductor structure "PIMBA (Processing-In-Memory Based Architecture)" that makes inference (the process of finding an answer) four times faster than existing AI models while cutting power consumption to less than half.

PIMBA is a new form of structure that combines "Transformer" and "Mamba," two brain structures responsible for how AI understands and judges sentences.

Most large language models (LLMs) such as ChatGPT, Claude, and Gemini currently use the Transformer structure. This method excels at grasping context by looking at all words at once, but as models grow, the amount of computation explodes, slowing speed and greatly increasing power consumption.

To solve this problem, the Mamba structure, which processes words in order, has recently emerged. It improves efficiency by storing and retrieving information in chronological order, but a "memory bottleneck" remained because data still has to be taken out of memory to compute.

To preserve the strengths of both structures while eliminating the bottleneck, the research team devised a "semiconductor structure that performs computation directly inside memory." Conventional GPUs (graphics processing units) fetch data from memory to perform computation, but PIMBA processes calculations directly inside the storage device without moving data. This approach eliminates data movement time to increase speed and significantly reduce power consumption.

In experiments, the PIMBA structure showed processing speeds up to 4.1 times faster than existing GPU-based systems, while average power consumption was reduced by a factor of 2.2.

The research results will be presented at the 58th IEEE/ACM International Symposium on Microarchitecture (MICRO 2025), a world-class computer architecture conference to be held in Seoul on Oct. 20. Earlier, it won the gold prize at the 31st Samsung HumanTech Paper Award, recognizing its technical excellence.

This research was carried out with support from the Institute of Information & Communications Technology Planning & Evaluation (IITP) and the AI Semiconductor Graduate School Support Program, and from the Ministry of Science and ICT–Institute of Information & Communications Technology Planning & Evaluation ICT R&D Program. The Electronics and Telecommunications Research Institute (ETRI) jointly supported the research, and the Integrated Circuit Design Education Center (IDEC) provided electronic design automation (EDA) tools.

References

arXiv (2025), DOI: https://doi.org/10.48550/arXiv.2507.10178

※ This article has been translated by AI. Share your feedback here.