Keywords: Automated Theorem Proving, LLM, Combinatorics
Abstract: Formal theorem proving with large language models (LLMs) has demonstrated promising results, yet combinatorial problems remain a notable challenge due to their reliance on problem-specific structures and definitions. AlphaProof, a notable LLM-based system for automated theorem proving, has shown strong performance in the International Mathematical Olympiad (IMO), obtaining a silver-medalist performance by solving all questions but two combinatorics problems. Existing formal benchmarks have limited combinatorial coverage and often overlook the importance of combinatorial structures. To address these gaps, we introduce CombStruct4Lean, a novel benchmark composed of 282 combinatorial problems formalized in the Lean4 proof assistant. CombStruct4Lean emphasizes the usage and reasoning with combinatorial structures, presenting significantly greater diversity than existing datasets. We conduct a novel analysis based on constructability, the challenge of proving that a defined structure is inhabited, to quantify the complexity of CombStruct4Lean compared to existing ones. We evaluate state-of-the-art automated theorem proving methods on our benchmark, revealing substantial room for improvement and highlighting the difficulty of reasoning with combinatorial structures.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 20542
Loading