DeepSeek surpasses OpenAI in math tests, raises concerns over accuracy and costs

(From left) Deepseek, ChatGPT application (app)./Yonhap News

I ordered the Chinese artificial intelligence (AI) DeepSeek R1 to solve the common subject of the Korean language area on the 2024 College Scholastic Ability Test, which is evaluated as 'too easy.' Surprisingly, it only got 5 out of 34 questions wrong, resulting in a total of a 12-point deduction. It also tackled the math subject well, except for a few hard problems.

Moreover, R1 reportedly surpassed the American OpenAI ChatGPT's o1 (79.2%) with an accuracy rate of 79.8% in the American Mathematical Olympiad. In coding tests, it also outperformed o1 (63.4%) with an accuracy of 65.9%. TechCrunch noted that R1 has received evaluations similar to or exceeding o1 in various math, coding, and reasoning tasks.

◇ I thought it was just cheap, but it’s smarter… U.S. big tech is on alert

As evaluations emerge that DeepSeek surpasses ChatGPT not only in cost but also in technology, U.S. big tech companies are feeling uneasy. DeepSeek received attention for its 'extreme cost efficiency' a week before it was first revealed. It claims to have produced a model that is comparable to or better than o1 with a learning expense of $5.5 million (approximately 7.9 billion won), which is 5.4% of ChatGPT's learning expense of $100 million (approximately 145.4 billion won).

If the inexpensive R1 is better than o1, customers might not feel the need to spend money to subscribe to ChatGPT. Also, since the huge development expense was the biggest barrier to entry, there is a greater possibility that AI latecomers will rush into the market. On that day, the stock prices of Naver and Kakao, classified under domestic AI software companies, closed at 216,500 won, up 6.13% from the previous trading day, and 38,350 won, up 7.27%, respectively.

Coincidentally, DeepSeek was unveiled right after U.S. President Donald Trump issued the message, 'We will dominate AI supremacy.' This has led to analysis that the AI leadership war between the two countries is intensifying. The Financial Times reported, 'Chinese IT corporations like DeepSeek, Alibaba, Tencent, and ByteDance have been narrowing the gap with the U.S. while enhancing cost efficiency and capabilities.' It added that this is not coincidence but an inevitable innovation due to the U.S. expansion of high-tech chip export restrictions.

On the morning of the 31st, when asking Deepseek about transportation from Wangsimni Station in Seongdong-gu, Seoul, to Coreana Cosmetics Hotel in Jung-gu, Seoul, hallucination symptoms appear./Deepseek capture

◇ DeepSeek also can't overcome the ‘hallucination’ phenomenon

However, it remains to be seen whether the DeepSeek craze will continue. DeepSeek R1 also exhibits the 'hallucination' symptoms experienced by AI models. AI hallucination refers to the symptoms in which a conversational model presents irrelevant answers as if they were true. Hallucinations mainly occur due to low data quality or inadequate algorithms for sifting through information.

For this reason, it shows an inability to answer even basic questions. For example, when asked about transportation from Wangsimni Station in Seongdong-gu, Seoul, to the Koreana Hotel in Gwanghwamun, Jung-gu, on the morning of the 31st, it replied that it would take 2535 minutes. It provided the bus number but did not mention transfer options, making it impossible to reach the destination. In fact, it even stated that taking a taxi would take 1520 minutes and cost 800 million won.

Additionally, the issue of unavoidable information leakage controversies has come to the fore for Chinese IT corporations. Governments and corporations around the world, including those in the U.S. and Europe, are urging their members not to 'install DeepSeek' due to concerns over information leakage.

There is also a question of how much trust can be placed in DeepSeek's development expenses. DeepSeek is reported to have focused on 'reinforcement learning' instead of traditional 'supervised fine-tuning' used by existing companies. Reinforcement learning is a technique that enhances inference capabilities to allow AI to find answers on its own. This means that it can improve performance at a lower expense than supervised learning models that require specific training in certain fields.

DeepSeek has disclosed its total learning expenses, but it avoids giving a clear explanation about the development process. This raises suspicions that it may have illegally acquired 'OpenAI's' learning data. If this suspicion is true, it could tarnish DeepSeek's advantage in 'cost efficiency' since the costs for utilizing learning data would also need to be included.

Professor Joo Jae-geol from KAIST's Graduate School of AI said, 'There is a possibility that DeepSeek utilized ChatGPT as a model answer during the 'distillation' phase of the AI learning process, which is a common method used by AI companies.' 'Distillation' refers to what occurs when an AI model uses the output results of another model for training purposes while developing similar functions.

Professor Joo also stated, 'The claim that the development expense amounted to about 8 billion won cannot be excluded as an exaggeration by China or DeepSeek,' adding, 'While the amount spent in the final stages of development might be 8 billion won, when including expenses incurred from trial and error, it likely cost much more.'

DeepSeek surpasses OpenAI in math tests, raises concerns over accuracy and costs

Yun Ye-won Staff writer, IT News Desk