Keywords: Interactive theorem proving, large language models, LLM agents, reasoning, mathematics, search
TL;DR: We introduce a gpt4 math agent for Coq, using either step-by-step interactive tactic generation or hierarchal proofs templating, as well as search.
Abstract: Formal theorem proving is challenging for humans as well as for machines. Thanks to recent advances in LLM capabilities, we believe natural language can serve as a universal interface for reasoning about formal proofs. In this paper, 1) we introduce Pétanque, a new lightweight environment to interact with the Coq theorem prover; 2) we present two interactive proof protocols leveraging natural language as an intermediate representation for designing proof steps; 3) we implement beam search over these interaction protocols, using natural language to rerank proof candidates; and 4) we use Pétanque to benchmark our search algorithms. Using our method with GPT-4o we can successfully synthesize proofs for 58% of the first 100/260 lemmas from the newly published Busy Beaver proofs.
Concurrent Submissions: N/A
Submission Number: 74
Loading