Keywords: autoformalization, lean 4
TL;DR: We present an agentic system where a generalist LLM orchestrates tools alongside a Lean-4–tuned model to autoformalize natural-language theorems into Lean 4, achieving over 50% translation success on a 400-theorem benchmark with few tool calls.
Abstract: The arrival of AI systems that can achieve a gold medal at the International Mathematical Olympiad (IMO) and the development of proof assistants such as Lean seem to foretell a transformative revolution in mathematical research. However, a bottleneck is that most undergraduate- and graduate-level theorems are not translated into code for proof assistants, a process known as autoformalization.
State-of-the-art fine-tuned LLMs in Lean 4 report at most 22.5\% accuracy (Pass@128) on graduate-level theorems. To address this gap, we propose and evaluate ALA, an agentic framework where a generalist LLM orchestrating tools works together with another LLM fine-tuned in Lean 4. ALA achieves a 52\% accuracy with less than 13 tool-calls on theorems from areas such as complex and real analysis, topology, and algebra. Our code and the related dataset are published on GitHub.
Submission Number: 244
Loading