OpenAI's o3 dominates xAI's Grok 4 in historic AI chess tournament

Graphic = Son Min-kyun

Global generative artificial intelligence (AI) models competed in strategy and reasoning abilities on the chessboard. In the 'Kaggle AI Chess Tournament,' featuring eight of the latest models from global generative AI giants such as OpenAI, xAI, Google, and Anthropic, a final match was held that pitted the pride of Sam Altman and Elon Musk against each other. This event, which verifies the strategic reasoning abilities of large language models (LLMs) in real-time through chess, attracted attention as it captured not only the technological competition but also the philosophical and leadership challenges of AI.

According to Google's data science platform Kaggle on the 8th, this tournament, which started on the 5th, was an experiment to verify critical thinking and strategic judgment capabilities in real game environments, rather than using conventional standardized benchmark scores.

A total of eight models participated, including OpenAI's 'o3' and 'o4 mini', xAI's 'Grok 4', Google's 'Gemini 2.5 Pro' and 'Flash', Anthropic's 'Claude 4', China's DeepSeek's 'R1', and Moonshot AI's 'Kimi K2'.

Chess-specific engines could not be used, and each AI model had to enter the next move in sentence form without a mouse or chessboard interface. The response time was a maximum of 60 minutes, and a rule was also applied whereby repeating the same incorrect move three times would result in an automatic loss. This method was meant to assess the ability to construct strategies and make judgments in complex game situations, rather than merely generating correct answers.

In the quarterfinals, the overwhelming skill difference among the top models was evident. OpenAI's o3, led by Sam Altman, achieved a complete victory of 4-0 against China's Moonshot AI's Kimi K2 without making a single mistake, while xAI's Grok 4, led by Musk, also secured a comfortable 4-0 victory against Google's lightweight model Gemini Flash. o4 mini defeated DeepSeek's R1, and Gemini Pro defeated Claude 4, both also winning 4-0 to secure a semifinal spot.

In the semifinals, the match between Grok 4 and Google's Gemini Pro was the only one that unfolded as a closely contested battle. The two models faced off evenly in terms of strategic reading and response, and in the final round, Grok 4 managed to take advantage of time beautifully to achieve a comeback victory of 3-2. Meanwhile, OpenAI's o3 continued its undefeated march with another 4-0 victory against its own model, o4 mini.

In the third-place match, Gemini Pro defeated o4 mini 2.5-1.5 to take third place. After winning the first two sets in succession, it lost the last set but maintained the lead with stable reading of the moves. o4 mini displayed generally solid performance, but showed weaknesses in its core strategic composition.

The final match scene of OpenAI 'o3' and xAI 'Grok 4'. /Captured from YouTube

The final was a match that encompassed not only technical prowess but also symbolic meaning. The contest between OpenAI's o3 and xAI's Grok 4 captured significant attention due to the competing dynamics of their founding representatives, Sam Altman and Elon Musk. However, the match was one-sided. o3 showcased systematic move reading and stable strategy operation, winning all sets for a complete victory of 4-0. It was an undefeated triumph, having not lost a single set from the quarterfinals to the finals.

World Chess Champion Magnus Carlsen, who commented on the match, noted, "o3 appears to have a chess rating of around 1200, while Grok 4 seems to be around 800." He evaluated the tournament as an opportunity to directly observe how AI actually undergoes thinking processes, much like 'Deep Blue vs. Kasparov' in the 1990s. A chess rating of 1200, according to the International Chess Federation (FIDE), corresponds to the level of an average club player, while 800 is considered a beginner's level. The best players in the world score above 2700-2800.

Immediately after Grok 4's defeat in the finals, Musk stated on X (formerly Twitter), "xAI put almost no effort into chess. Grok's chess skills are merely a secondary ability." He co-founded OpenAI with Sam Altman in 2015, but founded xAI after leaving the company due to a management dispute in 2018, taking a distinct path. Altman suggested last year in an interview that Musk was a "bully," indicating a complete rupture in their relationship.

Kaggle plans to evolve this tournament into a continuous AI performance verification platform rather than a one-time event. Meg Rizdal, product manager at Kaggle, stated on the official blog, "This AI chess tournament is not a one-time event but will develop as a continuous performance assessment standard," adding that they plan to expand from chess to verify the reasoning and collaboration capabilities of LLMs in games like Go, Mafia, and simulation games.

※ This article has been translated by AI. Share your feedback here.