IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch

ICLR 2026 Conference Submission20701 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: autoformalization, automated theorem proving, human-ai collaboration, benchmark, lean
Abstract: Reliable autoformalization remains an elusive goal even in the era of large language models (LLMs). Even the best LLMs struggle to translate natural language into formal constructs in languages like Lean. High-quality data has been a key bottleneck given the resource costs associated with manual curation and validation of these translations. On these lines, we introduce IndiMathBench, a human-verified benchmark designed to evaluate mathematical theorem proving, curated using an AI-powered human-assisted pipeline for formalizing natural language problems in Lean. IndiMathBench is composed of 416 formal Lean4 theorems paired with their corresponding informal problem statements, sourced from Indian Mathematics Olympiads. Our pipeline helps synthesize multiple candidate formalizations from an ensemble of LLMs, validating them using Lean prover, and finally summarizing results for the human validators through an interactive dashboard. This dashboard enables efficient validation and repair, while also capturing valuable human code editing data. We analyze the performance and failures of several state-of-the-art models through our pipeline while releasing IndiMathBench and human code editing analysis to facilitate further research on automated theorem proving.
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 20701
Loading