Keywords: Automated Reasoning, Theorem Proving, Autoformalization, AI for Math, LLM
TL;DR: We introduce ECP, a modular neuro-symbolic pipeline that combines LLM-driven enumeration and conjecturing with theorem provers to tackle constructive olympiad-level math problems in Lean.
Abstract: Mathematical reasoning is central to artificial intelligence, with applications in education, code generation, and research-level mathematical discovery. Mathematical competitions highlight two problem types: theorem-proving, requiring rigorous proofs, and answer-construction, requiring creative generation and formal verification of mathematical objects. Existing research reveals that LLMs can tackle difficult answer-construction tasks but are prone to errors from hallucinations and unverifiable steps, while symbolic methods guarantee rigor but falter in creative answer construction. This raises a key understudied question: how to solve answer-construction problems while preserving both LLM creativity and mathematical rigor? To address this problem, we introduce the Enumerate–Conjecture–Prove (ECP) framework, a modular neuro-symbolic method integrating LLM-based enumeration and pattern-driven conjecturing with formal theorem proving in Lean, and ConstructiveBench, a dataset of 3,640 formal answer-construction problems from math competitions. ECP is model-agnostic and shows consistent improvements over pure LLM baselines: on the subset of PutnamBench for answer construction, ECP formally solves 6 out of 337 answer-construction problems end-to-end (up from 4 without ECP) with GPT-5 mini and DeepSeek-Prover-V2-7B. On ConstructiveBench, ECP achieves 33.1\% end-to-end state-of-the-art accuracy (up from 32.5\%), demonstrating its potential to advance formal mathematical reasoning by combining LLM conjecturing with formal verification.
Supplementary Material: zip
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 13730
Loading