There is a saying, "Even if you go by a crooked path, it's fine as long as you reach Seoul." It is often used to mean that only achieving the goal matters regardless of the means. It may be an apt saying to describe the strategy Huawei devised to break through the United States' all-around technology sanctions in the recent AI semiconductor competition.
Huawei said on the 26th that it plans to launch its latest AI chip, the Ascend 950, and an AI data center solution in global markets, including Korea, next year. It signaled its ambition to offer AI-related corporations a new option in an AI chip market centered on Nvidia.
The biggest question is how Huawei implemented an AI Semiconductor, the culmination of cutting-edge technology. Unlike Nvidia or Broadcom, it cannot freely use Taiwan's TSMC's most advanced processes, and it cannot procure high bandwidth memory (HBM) from Samsung Electronics or SK hynix. In this environment, how will Huawei compete with Nvidia?
◇ "Chip performance is at a comparative disadvantage, offset by volume and bandwidth"
Huawei's Ascend 950, recently presented as a rival to Nvidia's graphics processing units (GPUs), remains largely undisclosed in specific chip architecture and implementation, but based on announcements and industry analysis so far, it appears to have pulled out clever moves within realistic constraints. In adverse conditions where it cannot make chips on advanced nodes, Huawei seems to have opted to densely package a larger number of relatively lower-performance chips into a kind of cluster and to push inter-chip communication performance to the extreme.
Nvidia has typically focused on boosting the performance of individual GPU chips. This is why the industry's most advanced processes are always applied to Nvidia's GPU production. It raises the performance, density, and efficiency of a single chip, and ties them together with high-speed inter-chip links and the CUDA ecosystem to deliver high throughput with fewer GPUs. Finishing the same model with fewer chips in less time is Nvidia's strength.
Huawei appears to have made the opposite choice. Rather than following Nvidia in absolute single-chip performance and ecosystem, it put weight on a strategy to tightly connect more chips and reach the target throughput via total system capacity. It acknowledges the gap that may arise in single-chip performance and offsets it by increasing the number of nodes and amplifying communication bandwidth. This is why the industry calls it a cluster strategy. Reaching the same destination, Huawei is closer to the idea of widening the road to Seoul with "volume" and "links."
The industry does not believe Huawei presented clear target performance without any basis. The Ascend 950 series circulating in the market is expected to have an interconnect bandwidth of around 2 TB/s. Rather than only increasing peak compute, it adopted a direction to boost overall training and inference performance at the system level by increasing memory bandwidth and inter-node communication, which can easily become bottlenecks in large models.
◇ Cost-saving effects for building AI infrastructure remain uncertain
The key is that there must be an attractive option that makes buyers choose Huawei over the stable choice of Nvidia. The industry expects Huawei to aggressively lower price points and focus on expense savings for major IT corporations. The expectation is that it will actively target corporations burdened by the skyrocketing prices of Nvidia's AI chips.
Some, however, say buyers should consider not only the expense of building AI infrastructure but also future operating expense. In the short term, Huawei's approach may look more cost-efficient than Nvidia's, but the medium- to long-term power operation expense needs to be weighed, analysts said. An official at a major domestic cloud company said, "To achieve computational performance similar to Nvidia, Huawei's model requires more chips and servers, and that increases the burden of power, cooling, rack space, and networking," adding, "The more chips you have, the harder it is to avoid rising synchronization and communication costs."