ProofAug: Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a novel NTP method ProofAug that achieves new SOTA (66.0%) on miniF2F-test Isaballe through equipping proof-generation LLMs with automation methods in different granularities via fine-grained structure analysis.
Abstract: The synergy between deep learning models and traditional automation tools, such as built-in tactics of the proof assistant and off-the-shelf automated theorem provers, plays a crucial role in developing robust and efficient neural theorem provers~(NTPs). However, for proof synthesis with LLMs, previous work applies automation tools either only when explicitly invoked by the model or at a single granularity level, failing to fully exploit their power. To solve this issue, we propose ProofAug, a procedure that equips LLMs with automation methods at various granularities through fine-grained structure analysis of model-generated proof proposals. ProofAug also serves as a versatile plug-and-play module that seamlessly integrates with any tree-search algorithm, enabling our construction of an efficient recursive proving (ERP) module to further enhance performance. The superiority of our method is validated on the miniF2F benchmark using the open-source deepseek-math-7b-base model and the Isabelle proof assistant. Notably, by additionally employing a mixed prompting strategy, we achieve a cumulative pass rate of 66.0% after curation of the dataset (61.9% for the original version) with 2100 queries to the model per problem (In contrast, the previous SOTA in Isabelle, Subgoal-XL, only achieves 56.1% using 16384 queries per problem). We also implement a Lean 4 version of ProofAug that can improve the pass@1 performance of Kimina-Prover-Preview-Distill-1.5B from 44.3% to 50.4% on miniF2F-test. Our code is available at https://github.com/haoxiongliu/ProofAug.
Lay Summary: Automated proof assistants like Isabelle and Lean help verify mathematical theorems, but even with recent advances in large language models (LLMs), generating accurate proofs remains a major challenge. We observe that one issue of existing systems is that they do not fully leverage existing off-the-shelf automation tools, often calling them only when explicitly told to or at a fixed level of detail. Thus, we developed ProofAug, a new method that helps LLMs write better proofs by smartly integrating automation tools at multiple levels of granularity. It does this by analyzing the structure of partial proofs and inserting the right tactic at the right time — like a good co-pilot helping the model along the way. ProofAug works with a range of search strategies and supports both Isabelle and Lean. On benchmark tests, it sets new performance records for proof generation using fewer computation budget than previous methods. We hope this brings us one step closer to truly capable AI provers.
Link To Code: https://github.com/haoxiongliu/ProofAug
Primary Area: Deep Learning->Large Language Models
Keywords: Neural Theorem Proving, Large Language Models
Submission Number: 9807
Loading