Artificial intelligence (AI) has reached the stage of directly proving complex mathematical theories beyond simple calculation.
Google DeepMind researchers on the 13th published findings on the AI system AlphaProof in the international journal Nature. Following AlphaProof's silver medal–level performance at the International Mathematical Olympiad (IMO) in Jul. last year, its operating principles have now been academically validated.
Mathematicians spend years, sometimes decades, to prove new theories. Proofs must be logically perfect, as even a single-line error is not allowed. While AI had been able to solve math problems or build predictive models, the realm of proof was different. Because explanations produced by language models are mostly natural sentences, it has been difficult to mechanically verify their logical correctness.
The DeepMind team showed through AlphaProof the potential for AI to perform complex mathematical reasoning like humans. The researchers used a proof-oriented programming language called Lean to formalize all mathematical expressions and reasoning. In this engineered environment, a computer can automatically determine whether a proof is correct.
By combining reinforcement learning, the AI attempted problems on its own and learned increasingly complex proofs by searching for approaches closer to the correct answer. Reinforcement learning is a training method in which an AI tries on its own and receives higher scores the closer it gets to the correct answer, thereby finding better solutions over time.
AlphaProof trained by automatically formalizing more than 80 million mathematical propositions. When given particularly difficult problems, it generated millions of related variations and learned from them in real time to boost adaptability.
As a result, AlphaProof, together with AlphaGeometry, an AI for solving geometry problems, solved four of the six problems in the 2024 IMO. This corresponds to a silver medal by human participant standards. It was recorded as the first time an AI entered the IMO medal range.
In the paper, the researchers explained how the AI understood mathematics and learned proofs, and in what ways that process resembles human logical reasoning. "The combination of large-scale reinforcement learning and a formal logic environment enabled complex mathematical thinking," the team said, adding, "This approach will form the foundation for developing trustworthy mathematical AI tools in the future."
Talia Ringer, a professor at the University of Illinois at Urbana-Champaign, who did not participate in the research, said, "AlphaProof is a system that learns mathematical proofs like a game." She added, "AlphaProof proved a theorem in one minute that students had been unable to solve for days, and it identified incorrect definitions to point out errors in the problems."
Ringer said, "Recently, OpenAI and Google achieved gold medal–level results on this year's IMO problems," adding, "AI's mathematical reasoning ability is evolving rapidly. A kind of 'space race' has begun in this field."
But mathematicians' reactions were mixed. Kevin Buzzard, a professor at Imperial College London, assessed that AlphaProof did little to help translate the proof of Fermat's Last Theorem, a famous hard problem in mathematics, into the Lean language. "In my experience, no AI system so far has reached a usable level," Buzzard said.
The research team also acknowledged these limits, adding, "Although AlphaProof has demonstrated the ability to solve competitive-level math problems, it still has limitations in some areas that involve complex problems or require intuitive reasoning," and, "If these challenges are addressed, AlphaProof is expected to become a powerful assistant for mathematical research going forward."
References
Nature(2025), DOI: https://doi.org/10.1038/s41586-025-09833-y
Nature(2025), DOI: https://doi.org/10.1038/d41586-025-03585-5