Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Yong Lin; Shange Tang; Bohan Lyu; Ziran Yang; Jui-Hui Chung; Haoyu Zhao; Lai Jiang; Yihan Geng; Jiawei Ge; Jingruo Sun; Jiayun Wu; Jiri Gesi; Ximing Lu; David Acuna; Kaiyu Yang; Hongzhou Lin; Yejin Choi; Danqi Chen; Sanjeev Arora; Chi Jin

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Theorem Proving, Reasoning

Abstract: Automated theorem proving (ATP) --- the task of generating a proof that passes automated proof verification given a math question in formal language --- is a critical challenge at the intersection of mathematics and Artificial Intelligence (AI). We introduce Goedel-Prover-V2, a family of two language models that establish a new state-of-the-art (SOTA) in open-source ATP, using the Lean proof assistant. In addition to standard expert iteration and reinforcement learning, our approach incorporates three key innovations: (1) During training when improvement plateaus on human questions, the prover does scaffolded data synthesis to generate synthetic questions of increasing difficulty for its own training; (2) The prover is trained to self-correct using Lean compiler feedback; (3) Improved test-time exploration through checkpoint averaging to balance accuracy and diversity. Our small model, Goedel-Prover-V2-8B, reaches 84.6\% pass@32 on MiniF2F and outperforms DeepSeek-Prover-V2-671B despite being $80\times$ smaller. Our flagship model, Goedel-Prover-V2-32B, achieves 88.1\% on MiniF2F at pass@32 in standard mode and 90.4\% in self-correction mode, outperforming prior SOTA by a large margin. Additionally, our flagship model solves 86 problems on PutnamBench at pass@184, securing first place among open-source models and surpassing DeepSeek-Prover-V2-671B's record of 47 problems by pass@1024 with about $20\times$ smaller model size and significantly lower compute budget. Our models, code, and data are released at \url{https://github.com/Goedel-LM/Goedel-Prover-V2}.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 8236

Loading

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, Jiayun Wu, Jiri Gesi, Ximing Lu, David Acuna, Kaiyu Yang, Hongzhou Lin, Yejin Choi, Danqi Chen, Sanjeev Arora, Chi Jin