Keywords: Agentic AI, LLMs, Formal Verification, MCPs, Physics
Abstract: We present Ax-Prover, a domain-agnostic multi-agent system for theorem proving.
The task of generating formal proofs requires both creativity and precise formalization.
Ax-Prover addresses this challenge by combining large language models (LLMs), which contribute knowledge and reasoning ability, with MCP-based verification and validation tools, which enforce rigor and syntactic correctness. We benchmark our approach on the large-scale NuminaMath-LEAN dataset and introduce two new datasets: one in Abstract Algebra and one in Quantum Theory. Our experiments show that Ax-Prover consistently outperforms state-of-the-art (SOTA) provers across domains. Notably, we find a large performance gap in the newly introduced domains, suggesting that while Ax-Prover adapts readily to novel areas, existing SOTA systems remain highly specialized to their training domains and struggle to generalize.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 20570
Loading