Hilbert: Recursively Building Formal Proofs with Informal Reasoning

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Formal Mathematics, Automated Theorem Proving, Mathematical Reasoning, Lean 4, LLMs for Math, Agents
TL;DR: We built an AI system that combines informal math reasoning with formal proof verification, achieving state-of-the-art results on formal math benchmarks.
Abstract: Large Language Models (LLMs) demonstrate impressive mathematical reasoning abilities, but their solutions frequently contain errors that cannot be automatically verified. Formal theorem proving systems such as Lean 4 offer automated verification with complete accuracy, but current prover LLMs solve substantially fewer problems than general-purpose LLMs operating in natural language. We introduce Hilbert, an agentic framework that bridges this gap by combining the complementary strengths of informal reasoning and formal verification. Our system orchestrates four components: an informal LLM that excels at mathematical reasoning, a specialized prover LLM optimized for Lean 4 tactics, a formal verifier, and a semantic theorem retriever. Given a problem the prover cannot solve, Hilbert employs recursive decomposition to split it into subgoals solved by the prover or reasoner LLM, leveraging verifier feedback to refine incorrect proofs. Experiments demonstrate that Hilbert substantially outperforms existing approaches. It achieves 99.2\% on MiniF2F (6.6\% points above the best publicly available method) and the **best known result** on PutnamBench with 462/660 problems solved (70.0\%), outperforming proprietary approaches like SeedProver (50.4\%) and achieving a 422\% improvement over the best publicly available baseline.
Submission Number: 99
Loading