TL;DR: We design a self-play algorithm for theorem proving that iteratively conjectures new statements and proves them, achieving state-of-the-art performance on miniF2F and ProofNet among whole-proof generation methods.
Abstract: A fundamental challenge in formal theorem proving by LLMs is the lack of high-quality training data. Although reinforcement learning or expert iteration partially mitigates this issue by alternating between LLM generating proofs and finetuning them on correctly generated ones, performance quickly plateaus due to the scarcity of correct proofs (sparse rewards). To keep improving the models with limited data, we draw inspiration from mathematicians, who continuously develop new results, partly by proposing novel conjectures or exercises (which are often variants of known results) and attempting to solve them. We design the Self-play Theorem Prover (STP) that simultaneously takes on two roles, conjecturer and prover, each providing training signals to the other. The conjecturer is trained iteratively on previously generated conjectures that are barely provable by the current prover, which incentivizes it to generate increasingly challenging conjectures over time. The prover attempts to prove the conjectures with standard expert iteration.
We evaluate STP with both Lean and Isabelle formal versifiers. With 51.3 billion tokens generated during the training in Lean, STP proves 28.5% of the statements in the LeanWorkbook dataset, doubling the previous best result of 13.1% achieved through expert iteration.
The final model achieves state-of-the-art performance among whole-proof generation methods on miniF2F-test (65.0\%), ProofNet-test (23.9\%) and PutnamBench (8/644) with pass@3200.
Lay Summary: Proving mathematical theorems is one of the most rigorous forms of reasoning — and a major challenge for AI. Traditional systems rely heavily on large datasets of theorem statements, which are often hard to scale. In this work, we introduce STP (Self-play Theorem Prover), a new AI training method that takes a different approach: having the AI generate its own problems and attempt to solve them.
STP consists of two components: a conjecturer, which proposes new conjectures that are challenging yet approachable, and a prover, which tries to prove them. These two components are trained together, providing feedback to each other. We evaluate STP on formal proof benchmarks in Lean and Isabelle — two well-known formal proof languages — and achieve state-of-the-art performance, significantly outperforming previous methods based on classic RL algorithms.
STP mirrors how human mathematicians often work: by proposing new questions and learning through the process of solving them. This approach offers a promising step toward AI systems that can continue improving without needing more data from humans.
Link To Code: https://github.com/kfdong/STP
Primary Area: Deep Learning->Large Language Models
Keywords: formal theorem proving, AI for math, self-play, large language model, reasoning, reinforcement learning
Submission Number: 3991
Loading