A Formal Combinatorial Benchmark Emphasizing Structures for Automated Theorem Proving

A Formal Combinatorial Benchmark Emphasizing Structures for Automated Theorem Proving

ICLR 2026 Conference Submission20542 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Automated Theorem Proving, LLM, Combinatorics

Abstract: Formal theorem proving with large language models (LLMs) has demonstrated promising results, yet combinatorial problems remain a notable challenge due to their reliance on problem-specific structures and definitions. AlphaProof, a notable LLM-based system for automated theorem proving, has shown strong performance in the International Mathematical Olympiad (IMO), obtaining a silver-medalist performance by solving all questions but two combinatorics problems. Existing formal benchmarks have limited combinatorial coverage and often overlook the importance of combinatorial structures. To address these gaps, we introduce CombStruct4Lean, a novel benchmark composed of 282 combinatorial problems formalized in the Lean4 proof assistant. CombStruct4Lean emphasizes the usage and reasoning with combinatorial structures, presenting significantly greater diversity than existing datasets. We conduct a novel analysis based on constructability, the challenge of proving that a defined structure is inhabited, to quantify the complexity of CombStruct4Lean compared to existing ones. We evaluate state-of-the-art automated theorem proving methods on our benchmark, revealing substantial room for improvement and highlighting the difficulty of reasoning with combinatorial structures.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 20542

Loading