Keywords: AI4Math, Automated Theorem Proving (ATP), Large Language Models (LLMs), Lean, Andrews-Curtis Conjecture, Reinforcement Learning (RL)
TL;DR: We developed a system that combines Lean formalization, LLMs, and RL to tackle the Andrews–Curtis conjecture, an open problem in group theory.
Abstract: Automated theorem proving (ATP) with large language models (LLMs) has demonstrated impressive progress on undergraduate and olympiad mathematics. However, these problems are distant from the forefront of open mathematical research. In this work, we make the push beyond competition benchmarks and investigate the unsolved Andrews–Curtis (AC) conjecture in group theory. We benchmarked state-of-the-art LLM theorem provers on AC-related tasks, revealing a substantial performance gap: models that perform well on competition-level benchmarks fail in research-level reasoning. To bridge this gap, we formalized the AC conjecture in Lean. We introduce a deterministic autoformalizer, ACC, that rigorously verifies AC trivialization paths and produces the corresponding Lean proof. Building on this, we leveraged LLMs for theorem discovery, synthesizing patterns from ACC generated Lean proofs as reusable theorem statements. Finally, we incorporated these theorems into reinforcement learning (RL) agent training to find AC trivialization paths. We demonstrate that theorem incorporation increases the number of successful trivializations and RL efficiency. Across all runs, we solved 753 presentations belonging to the Miller–Schupp (MS) family, disproving them as potential counterexamples to the AC conjecture.
Submission Number: 105
Loading