Keywords: theorem proving, reinforcement learning, automated theorem proving, MCTS, reasoning, Lean
TL;DR: Online reinforcement learning for Lean gives near-SOTA with almost no data.
Abstract: We propose a scalable and efficient reinforcement learning framework as a strong baseline for theorem proving with limited data. This baseline reaches performances comparable to the current state-of-the-art in theorem proving, while only training on a few hundred examples. This a first step toward an efficient and easily reproducible combination of autoformalization, synthetic data generation and reinforcement learning, which could unlock significant advancements in neural theorem proving.
Concurrent Submissions: N/A
Submission Number: 106
Loading