Illustration = ChatGPT

Samsung Electronics and SK hynix are eyeing high-bandwidth flash (HBF) as their next growth engine in step with changes in the artificial intelligence (AI) market. With SK hynix beginning procedures to establish an "HBF standard" with U.S. memory semiconductor corporations SanDisk, some say it has moved faster than Samsung Electronics to respond to the market.

Samsung Electronics also appears active in securing related technology, recently obtaining a series of HBF-related patents. According to the industry, Samsung Electronics has continued HBF-related research since the early 2020s. While it has not made an official announcement like SK hynix, it has steadily examined the potential to expand business in the HBF space.

The two companies are generating massive revenue by producing high-bandwidth memory (HBM), a core component of AI chips such as graphics processing units (GPUs). HBM, which stacks volatile DRAM to increase data transfer speed, is used to power so many AI services that supply can hardly keep up with demand.

However, as AI services enter commercialization, the center of gravity is shifting from conventional "training" to "inference," which affects actual performance, revealing HBM's technical limits. Based on volatile memory, HBM faces structural constraints that make it difficult to increase capacity. The more AI services become personalized, the more data they must remember.

There are efforts to solve this with HBF, which stacks NAND flash, a nonvolatile memory. Some forecast that HBF demand could surpass HBM demand around 2038. While DRAM retains data only when power is supplied, NAND flash does not lose stored data even when power is off.

◇ HBM, which solved the "memory bottleneck," has become essential for AI

To understand why HBF has emerged as a "next growth engine," we must first look at how HBM became essential for AI. That is because HBF is seen as an alternative to overcome the "limits of HBM."

AI is the result of computation. Before AI emerged, the central processing unit (CPU), a serial processing system, handled the computations required by computer operating systems. Memory only needed to provide the data the CPU required, so large capacity was unnecessary. In contrast, AI is based on massive parameters. It requires large-scale matrix multiplications and vector operations, and processing these one by one in the traditional serial manner takes a long time. GPUs, specialized for parallel computation, were better suited for AI. One study found that running computations for the same AI model on a GPU can cut processing time by about 50% compared with a CPU.

This created a new problem. The DRAM capacity and speed supplying data to GPUs were tuned to CPUs, preventing maximum efficiency. In the past, a "compute bottleneck" occurred as DRAM waited to supply data until CPU operations finished, but with GPUs, the reverse "memory bottleneck" occurs. If the GPU must pause and wait because it cannot get data from DRAM, efficiency inevitably drops. HBM stacked DRAM to widen data pathways and deliver performance suited to the parallel computations GPUs require, rising as a "core component of the AI era." It is no exaggeration to say that the leap in large language model (LLM) performance over the past one to two years was enabled by the combination of GPUs and HBM.

Sixth-generation high-bandwidth memory (HBM4) product by Samsung Electronics. /Courtesy of Samsung Electronics

◇ In the AI inference era, HBM hits limits… HBF as an alternative

For AI to be offered as a real service, inference performance that operates without latency is crucial. As major big tech companies move to monetize AI services, there is growing assessment that the market's focus has shifted from training to inference.

As the number of users simultaneously accessing AI services surges, the existing HBM-centric memory architecture alone struggles to meet both the large-capacity data processing and power efficiency required at the inference stage. According to the industry, the GPT-4 model used in ChatGPT needs 3.6 terabytes (TB) for inference, while the capacity currently provided to a GPU by fifth-generation HBM (HBM3E) is about 192 gigabytes (GB). It requires bundling six to seven GPUs per inference request, which leads to higher expense for service delivery. As models grow larger and more sophisticated, expense rises and latency increases.

Personalized AI services also highlight HBM's limitations. For AI to remember a user's actions and conversations and provide contextually appropriate answers, it must store more data. Because HBM loses data when power is cut, it struggles to ensure continuity, and it is not suitable for dramatically increasing capacity.

This is why a "new memory" that can store large amounts of data and feed it quickly to computing units is needed. Solid-state drives (SSDs) also use NAND flash, but their data transfer speed is too slow for AI.

High-bandwidth flash (HBF) architecture. /Courtesy of SanDisk

◇ NAND is structurally more complex than DRAM… "The corporations that solve the technical challenges will lead the market"

HBF is cited as a technology to overcome the limits of this HBM-centric memory architecture. It is expected to offer nonvolatile memory with compute accessibility similar to HBM. Having already observed similar effects by stacking DRAM, the same approach is now being applied to NAND flash.

Even if HBF is commercialized, HBM is unlikely to disappear from the market. A likely approach in AI chip design is to place HBF between HBM and SSD to boost performance. The aim is to fill the "gap" between HBM, an ultra-high-speed memory, and SSDs, large-capacity storage devices, to secure both capacity scaling and power efficiency required for inference.

There are several technical challenges in implementing HBF. Today's mainstream is already three-dimensional (3D) NAND flash, which stacks planar (2D) cells vertically. With structures already stacked by the hundreds of layers, it is more complex than stacking DRAM. Applying the same silicon via (TSV, a packaging technology that drills tiny holes in semiconductors to vertically connect top and bottom chips) process as HBM would therefore cause a steep drop in Production yield. Because of this, the industry is discussing implementing HBF by stacking entire 3D NAND flash units. A semiconductor industry source said, "The technology itself is challenging, so HBF is more difficult to implement than HBM."

Determining the order of data processing is also tricky. The "logic die" at the bottom of HBM controls multiple DRAMs. HBF likewise needs a "controller" that accesses multiple NAND flashes in parallel and sets the order of data processing. However, because NAND flash is structurally slow in input/output speed, analysts say fundamental innovation is needed to design a controller that meets AI performance. A semiconductor industry source said, "To boost controller performance, technologies are being discussed to predict access patterns and prefetch data."

Samsung Electronics and SK hynix, noting HBF's market potential, are experimenting with various approaches to solve the technical challenges needed for implementation. Samsung Electronics is reportedly approaching development by integrating a "FinFET" structure, a 3D transistor technology, into the controller to implement HBF. SK hynix is applying the VFO process (a semiconductor packaging technology that changes the wires connecting the chip and circuit from curved to vertical), previously used in HBM, to HBF to address Production yield issues. Instead of TSV penetration, it aims to implement HBF with a structure that connects vertically along the chip's outer edge.

A researcher at a market research firm said, "If HBF is commercialized, it could be a 'game changer' in the memory semiconductor market like HBM," adding, "Since both Samsung Electronics and SK hynix are securing foundational technologies, I believe the corporations that solve the challenges required for implementation will dominate the next-generation market."

※ This article has been translated by AI. Share your feedback here.