Kim Jeong-ho, a professor in the School of Electrical Engineering at KAIST who devised the basic structure of high-bandwidth memory (HBM) and is called the "father of HBM," said this at a technical briefing held at the Press Center in Gwanghwamun, Seoul, on the 3rd and officially presented the HBF (high-bandwidth flash) roadmap. Kim emphasized, "If AI continues on its current path, HBF is not a choice but a necessity." If HBM, which stacks DRAM, serves as the high-speed memory required for computation, HBF, based on NAND flash, acts as a "memory store" that AI continuously calls up and uses as large-capacity memory.
AI's evolution is behind Kim's introduction of HBF. AI has progressed from training to inference, and again to agent AI. Going beyond answering a single question and stopping, agent AI performs a sequence of tasks such as reading emails, organizing documents, searching external materials, and making judgments. As Multimodal AI, which processes not only text but also images, video, and audio simultaneously, is combined with this, the amount of data AI must handle and its memory requirements are growing to a level different from before.
This is why memory is becoming more important. Explaining the transformer model, the basic structure of current AI models, Kim said, "The KV cache generated in the process of AI understanding the input is not simple data but a kind of codebook that contains the relationships between words and concepts," adding, "I call this the language of god that AI uses." This KV cache is continually referenced in the process of generating words one by one, and as context grows and goes Multimodal, its size increases from hundreds of gigabytes (GB) to terabytes (TB), and in some cases to tens of TB.
The problem is that such massive memory cannot be handled by existing HBM alone. Kim said, "No matter how much you stack HBM, it is around 200 GB," adding, "The moment the KV cache grows to hundreds of GB or TB, HBM alone is structurally impossible." He explained, "The reason AI is slow to produce the first word and its overall generation speed drops ultimately comes down to memory limits."
The solution Kim presented is HBF. HBF is a memory based on high-capacity NAND flash; it is slower than HBM but much faster than an SSD, and above all, it can greatly expand memory capacity. Kim compared this to a library. "If HBM is the reference book on your desk, HBF is the bookshelf right next to you. In an open-book exam, the desk alone is not enough; you eventually have to pull books from the shelf," he said. As agent and Multimodal AI advance, this "bookshelf-type memory" becomes essential.
In fact, Nvidia has said it will introduce a dedicated platform called Inference Context Memory Storage, which stores AI conversation context, in its next-generation architecture Vera Rubin. CEO Jensen Huang said at CES 2026, "This platform will grow into a massive storage market that handles the working memory of AI worldwide." The industry sees a high possibility that HBF will be adopted as the core memory of this new platform. Just as HBM has established itself as the core memory of graphics processing units (GPUs), HBF could emerge as the standard memory for AI inference storage.
That day, Kim also unveiled a memory architecture roadmap that includes HBF. Initially, HBF will first be applied in inference-centered environments. Large KV caches and long-context data will be handled by HBF, while high-bandwidth computation and model parameter processing will be handled by HBM. In other words, HBM handles areas that need speed, and HBF handles areas that need capacity. Kim explained that as AI workloads expand, HBF's role and share within the memory system will continue to grow.
Kim described this trend as a competition for memory leadership in the AI era. In the past, the CPU in the PC era and the application processor (AP) in the smartphone era defined computer architecture, but in the AI era, memory architecture has emerged as the key factor that determines performance and scalability. In particular, he directly mentioned Samsung Electronics and SK hynix, saying, "In fact, Korea is the only country that can do HBM and HBF at the same time." Because the number of corporations that possess both HBM and NAND flash is limited, he said both companies have competitiveness.
Kim stressed that GPU performance improvements and changes in memory architecture are inseparable. No matter how fast the GPU that handles AI computation becomes, if the supporting memory hierarchy does not keep up, both performance and scalability will inevitably hit a ceiling. For this reason, he predicted that global GPU corporations will further strengthen cooperation with Korean memory corporations.
Kim said, "As long as the current computer architecture and AI models are maintained, AI will have no choice but to keep using more memory. By 2038, demand for HBF will surpass HBM," adding, "The AI performance race is shifting from computation-centered to memory architecture competition, and whoever first designs and leads the memory hierarchy that encompasses HBM and HBF will set the direction of the industry."