Reinforcement Learning for Hierarchical Proof Generation in Lean 4

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, hierarchical inference, theorem porving, lean, ai for math
TL;DR: RL for lemma-based proving gives strong and explorative policies and overcomes RL plateaus.
Abstract: Scaling formal theorem proving to more complex problems requires more efficient inference methods than standard whole-proof generation, which struggles with longer proofs due to exponentially decreasing success rates with increasing proof length. In this work, we systematically study how online reinforcement learning can be coupled with a hierarchical (lemma-based) style of proof generation, that has recently gained popularity for inference-time methods. We show a fruitful interaction in two ways: reinforcement learning allows to train proof decomposition policies successfully, and hierarchical inference allows to overcome plateaus in reinforcement learning as its richer distribution favors exploration and generated lemmas can be understood as an online synthetic data generation technique. Overall, hierarchical inference trained with reinforcement learning produces strong numbers on evaluations, even at the 7B parameter scale, and outperforms standard whole-proof generation setups in terms of sample efficiency and scalability, making it a suitable technique for necessarily data-constrained formalization research efforts.
Submission Number: 238
Loading