Keywords: AutoFormalization, Lean4, Dataset, LLM, AI4Math
TL;DR: This article proposes FMC, a Lean4 formal language dataset of mathematical competition-level difficulty, and evaluates SoTA provers on the proposed dataset.
Abstract: Efficient and accurate autoformalization methods, which leverage large-scale datasets of extensive natural language mathematical problems to construct formal language datasets, are key to advancing formal mathematical reasoning. In this paper, we propose an autoformalization pipeline based on large language models with error feedback for syntactic verification and problem decomposition for semantic alignment check, achieving a fully automatic and training-free formalization approach. Using this pipeline, we curate an Olympiad-level dataset aligning natural language problems with Lean formalizations. The dataset contains $3,214$ natural language mathematical problems and $6,994$ corresponding Lean statements, indicating a one-to-many relationship where a single problem may map to multiple formal representations. This dataset is well-suited as a benchmark for automated theorem provers. Additionally, we investigate the formalization and reasoning capabilities of various LLMs and empirically demonstrate that problem decomposition, few-shot learning and error feedback are key components to enhance the autoformalization process.
Experiments of three automated theorem provers on the \dataset\ dataset also highlight its challenging nature and its value as a benchmark for formal reasoning tasks.
Primary Area: datasets and benchmarks
Submission Number: 22663
Loading