ProofRepairBench: Exploring Proof Repair in Lean

Published: 02 Mar 2026, Last Modified: 24 Mar 2026ICLR 2026 Workshop VerifAI-2EveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 8 pages)
Keywords: theorem proving, proof repair, AI for Math, Lean theorem prover
TL;DR: We construct RepairBench, a benchmark which isolates the proof repair task and presents 127 problems; we use RepairBench to evaluate five repair strategies.
Abstract: Proof repair is an emerging task in AI systems for formal theorem proving: given an incorrect proof and an error message from the underlying proof assistant, the goal is to generate a correct proof. As AI proving systems shift towards whole-proof generation, the rigor of proof-checking kernels makes it such that one-shot-generated proofs are rarely error-free. However, they may be easily repairable, which makes the proof repair task a crucial component of AI proving systems and an important research target. To study the proof task in isolation, we present RepairBench---a benchmark of 127 challenging repair problems in Lean. We use this benchmark to evaluate a set of language models coupled with various meta-generation strategies for proof repair and draw a number of practical takeaways. We release the benchmark to make it a useful for to community to advance AI proving systems.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 57
Loading