IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch

Param Biyani; Shashank Kirtania; Yasharth Bajpai; Sumit Gulwani; Ashish Tiwari

IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch

Param Biyani, Shashank Kirtania, Yasharth Bajpai, Sumit Gulwani, Ashish Tiwari

19 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: autoformalization, automated theorem proving, human-ai collaboration, benchmark, lean

Abstract: Reliable autoformalization remains an elusive goal even in the era of large language models (LLMs). Even the best LLMs struggle to translate natural language into formal constructs in languages like Lean. High-quality data has been a key bottleneck given the resource costs associated with manual curation and validation of these translations. On these lines, we introduce IndiMathBench, a human-verified benchmark designed to evaluate mathematical theorem proving, curated using an AI-powered human-assisted pipeline for formalizing natural language problems in Lean. IndiMathBench is composed of 416 formal Lean4 theorems paired with their corresponding informal problem statements, sourced from Indian Mathematics Olympiads. Our pipeline helps synthesize multiple candidate formalizations from an ensemble of LLMs, validating them using Lean prover, and finally summarizing results for the human validators through an interactive dashboard. This dashboard enables efficient validation and repair, while also capturing valuable human code editing data. We analyze the performance and failures of several state-of-the-art models through our pipeline while releasing IndiMathBench and human code editing analysis to facilitate further research on automated theorem proving.

Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)

Submission Number: 20701

Loading