Proof-Verifier: Enabling Reinforcement Learning from Verifiable Rewards for Mathematical Theorem Proving

ICLR 2026 Conference Submission15797 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language model, reinforcement learning, math theorem proof
TL;DR: we propose a proof verifier for mathematical theorem proving to enable RLVR in this domain.
Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has revolutionized mathematical reasoning, enabling models like DeepSeek-R1 and OpenAI-o1 to achieve human-level performance on traditional math tasks where answers are single numbers or equations. However, extending RLVR to mathematical theorem proving remains challenging due to the fundamental verification bottleneck: unlike traditional math tasks, theorem proving generates entire reasoning processes that lack reliable automated verification methods for reward signal generation. In this work, we address this verification bottleneck by introducing Proof-Verifier, the first generative verifier specifically designed to enable RLVR applications in mathematical theorem proving. Proof-Verifier supports both formal and informal language~(e.g., natural language) proofs, providing the detailed verification capabilities essential for effective reinforcement learning. To train Proof-Verifier, we develop a formal-to-informal translation pipeline for high-quality synthetic data generation and employ a novel two-stage coarse-grained to fine-grained reward modeling mechanism. Experimental validation demonstrates that Proof-Verifier achieves 93\% verification accuracy, enabling reliable reward signals for RLVR applications. We show that Proof-Verifier successfully enables effective test-time scaling (79\% win rate in best-of-N sampling and 32\% improvement in multi-turn proof refinement), and both single-turn and multi-turn RLVR training, consistently improving LLM-based theorem proving performance. Our work establishes the foundation for applying RLVR methodologies to mathematical theorem proving, extending the recent success of reasoning-enhanced models to this challenging domain.
Primary Area: reinforcement learning
Submission Number: 15797
Loading