Ax-Prover: Agentic LEAN Proving with LLMs and MCP-based Verifiers

Ax-Prover: Agentic LEAN Proving with LLMs and MCP-based Verifiers

ICLR 2026 Conference Submission20570 Authors

19 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agentic AI, LLMs, Formal Verification, MCPs, Physics

Abstract: We present Ax-Prover, a domain-agnostic multi-agent system for theorem proving. The task of generating formal proofs requires both creativity and precise formalization. Ax-Prover addresses this challenge by combining large language models (LLMs), which contribute knowledge and reasoning ability, with MCP-based verification and validation tools, which enforce rigor and syntactic correctness. We benchmark our approach on the large-scale NuminaMath-LEAN dataset and introduce two new datasets: one in Abstract Algebra and one in Quantum Theory. Our experiments show that Ax-Prover consistently outperforms state-of-the-art (SOTA) provers across domains. Notably, we find a large performance gap in the newly introduced domains, suggesting that while Ax-Prover adapts readily to novel areas, existing SOTA systems remain highly specialized to their training domains and struggle to generalize.

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 20570

Loading