OpenAI announced on the 19th (local time) that its undisclosed artificial intelligence (AI) model achieved results equivalent to a gold medal at the International Mathematical Olympiad (IMO), the most prestigious mathematics competition in the world.
OpenAI stated through social media X that it achieved gold medal-level performance at the 2025 IMO through a general-purpose reasoning large language model (LLM).
Held every July, the IMO is a competition where mathematically gifted high school students from around the world gather to compete, with a total of six challenging problems presented over two days. The problem-solving process is evaluated rigorously, and typically, a score in the mid-30s out of a possible 42 points is considered a gold medal range.
OpenAI's AI model solved five out of six problems without using the internet or external tools, just like the actual competition, but it did not solve the sixth problem, which is considered the most difficult. Independent grading by three experts with IMO award experience resulted in a total score of 35 points for the model.
OpenAI emphasized the significance of this achievement, noting that it was accomplished through a 'general-purpose' model that can be applied in various fields, not just specialized in solving mathematical problems. Last year, Google's 'AlphaProof,' which was announced to have reached the silver medal level at the IMO, was a model specialized in mathematical proofs.
Sam Altman, CEO of OpenAI, described this result as 'a significant advancement toward artificial general intelligence (AGI)' and evaluated it as 'an indicator of the pace of AI technology advancement over the last decade.' However, he clarified that this experimental model is separate from the next-generation model GPT-5, and there are no plans for a release in the coming months.
On the other hand, Gary Marcus, a prominent critic in the field of AI, evaluated on his social media that 'the fact that it achieved this without tools or the internet is impressive,' but pointed out that 'there's a lack of information on this model's specific structure, training methods, and reasoning abilities in areas outside of mathematics, making it premature to assess the significance of its achievement.'