ProofGym: Unifying LLM-Based Theorem Proving Across Formal Systems

Published: 17 Oct 2025, Last Modified: 21 Nov 2025MATH-AI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Automated Theorem Proving, LLMs, Infra
Abstract: Large language models (LLMs) have accelerated progress in automated theorem proving, but most systems remain confined to a single proof assistant, hindering cross-system reuse of reasoning patterns and complicating scalable evaluation. We present ProofGym, a lightweight, high-throughput backend that unifies interaction with heterogeneous proof assistants (Coq, Isabelle, Lean) behind a common Python API. ProofGym supports both whole-proof and interactive stepwise modes, offers a language-agnostic state/result schema, enables non-blocking batched execution with bounded concurrency, and emits structured logs suitable for dataset curation and evaluator development. Preliminary experiments show substantial end-to-end throughput improvements for verification and proof search while preserving per-request latency. This paper focuses on system design, abstractions, and cross-system pipelines; full-scale training and broader ablations are left as ongoing work.
Submission Number: 218
Loading