Last year, NVIDIA's next-generation artificial intelligence (AI) accelerator 'Blackwell,' which was delayed due to design flaws and overheating issues, continues to face problems after its release. Major clients, including Microsoft, Amazon Web Services, and Google, are reportedly postponing Blackwell orders or requesting a transfer to previous generation AI chips.
If the supply issue with Blackwell, considered the largest source of revenue for memory semiconductor companies this year, prolongs, it is likely to have unavoidable negative impacts on corporations such as SK Hynix, Micron, and Samsung Electronics. Amid gloomy market prospects for general DRAM, NAND flash, and enterprise solid-state drive (SSD) markets this year, if the supply of high-bandwidth memory (HBM), which has the highest profit margins, falls short of expectations, a direct blow is inevitable.
According to IT media outlet The Information on the 13th (local time), Microsoft, AWS, Google, and Meta Platforms have reportedly canceled some of their orders for NVIDIA's 'Blackwell GB200' racks. This was explained as a result of overheating issues in the first shipment of racks equipped with Blackwell chips, alongside problems in the method of chip consolidation, leading to some orders being postponed or canceled. Racks used in data centers are essential devices that securely hold and connect chips, cables, and other necessary equipment.
NVIDIA's Blackwell struggled with design issues throughout last year. When NVIDIA first unveiled Blackwell in March of last year, it announced that the product could be launched in the second quarter of 2024, but it was postponed once due to discovered defects during the production process. Then, during the earnings announcement in August, it revealed plans to mass-produce in the fourth quarter of last year, but again faced design issues that delayed supply.
This is expected to act as a variable in the investment plans of major U.S. tech firms that plan to make significant AI server investments this year. Meta and Google have purchased over $10 billion (approximately 13.96 trillion won) worth of the Grace Blackwell GB200, while Microsoft has reportedly ordered up to 65,000 units for use by itself and OpenAI.
Blackwell has improved performance by 30% compared to the previous generation AI chip 'H100,' and energy consumption has been reduced to as low as one twenty-fifth level, which has met the high expectations of large IT companies. It is reported that NVIDIA has invested over $10 billion (14 trillion won) in research and development (R&D) for Blackwell, with a price set between $30,000 and $40,000 per unit.
However, the ongoing overheating issue, which poses the biggest risk to semiconductor performance, has heightened concerns about reliability in the IT industry. For instance, Microsoft, a partner of OpenAI which developed ChatGPT, had planned to install more than 50,000 GB200 racks equipped with Blackwell chips in Phoenix, Arizona, but as defects in Blackwell arose, OpenAI reportedly requested Microsoft to provide previous generation NVIDIA 'Hopper' chips.
An official from a major domestic memory company stated, "The overheating issue with the Blackwell GB200 has been consistently raised since last year, and we have continually requested power efficiency improvements from SK Hynix, the largest HBM supplier in the country," adding, "A significant portion of the power used in AI data centers arises from memory, and since HBM is mounted in a 2.5D structure as opposed to traditional servers, its proximity to processors exacerbates the overheating problem."
Domestic memory semiconductor companies, whose performance this year hinges on HBM, are keeping a wary eye on the situation. This is because NVIDIA's latest GB200 NVL72 platform incorporates as many as 576 units of the latest HBM. Should the Blackwell defect issue prolong, it is highly likely to deal a direct blow to the performance of its main supplier, SK Hynix.
An official from a domestic securities firm noted, "I am aware that many issues are arising from the first batch of the NVIDIA Blackwell G200's supply. There are even products that may not operate at all due to overheating problems," adding, "However, since this is still the first batch, there is a possibility that issues may be corrected later for a resumption of mass supply, but the likelihood of delayed revenue generation is high compared to the plan."