Star-Corrector: A Multi-Turn Interactive Reinforcement Learning Framework for Lean4 Theorem Correction
Keywords: multi-turn interaction, reinforcement learning, Multiagent, Formal Proof
TL;DR: A Multi-Turn Reinforcement Learning Framework for Lean4 Theorem Proving via Environment Interaction
Abstract: Formal mathematical reasoning requires models to generate verifiably correct proofs, a process where existing single-turn generation paradigms may fail to utilize the critical feedback from proof checkers. To bridge this gap, we present Star-Corrector (State-Thinking-Answer-Reward Corrector), a multi-turn, feedback-driven framework that explicitly learns from verification signals to refine faulty proofs. Our approach models proof generation as an iterative refinement process: starting from an initial flawed attempt, the model interacts with the Lean verifier at each turn to identify errors and progressively revises the proof. This multi-turn interaction allows the model to internalize corrective signals and learn precise repair strategies. The core contributions of this work are threefold. First, we introduce a multi-turn interaction model that formally defines the proof generation as a process of iterative correction based on verifier feedback, moving beyond single-turn generation. Second, we effectively optimize the policy within this interactive setting by applying GRPO to leverage the sequential verification outcomes. Third, we develop a sampling strategy that dynamically balances problem difficulty during training by leveraging pre-defined difficulty levels and the model's evolving success rate. On the MiniF2F benchmark, STAR-Corrector elevates the pass rate from 64.34\% (base model with 32 samples) to 96.72\% under a 32+32 sampling budget, marking an absolute gain of +32.37\%. The results demonstrate that our approach, particularly the adaptive sampler, effectively enhances generalization on medium and hard problems, validating the importance of closed-loop, data-efficient training for formal reasoning. All code and data are available at an anonymous repository: \href{https://anonymous.4open.science/r/ICLR-Star-E096/}{https://anonymous.4open.science/r/ICLR-Star-E096/}.
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 7742
Loading