Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs
Keywords: Formal theorem proving, Theoretical Computer Science, LLM Reasoning
TL;DR: We propose a scalable synthesis framework for theorem proving challenges in rigorously aligned formal-informal pairs using TCS-inspired problem modules.
Abstract: Formal theorem proving (FTP) stands at the forefront of LLM reasoning, yet existing datasets are largely limited in scope due to their dependence on manual curation.
We identify theoretical computer science (TCS) as a promising domain as many TCS problems can be automatically modeled and verified, thereby enabling scalable formalization with rigorous alignment guarantee.
We showcase this approach by two TCS problem modules: the Busy Beaver (BB) challenge and the Mixed Boolean–Arithmetic (MBA) challenge.
We evaluate leading reasoning models and dedicated theorem provers on our benchmark, the best-performing DeepSeekProver-v2-671B attains only 57.5\% on BB challenge and no more than 12\% on MBA challenge.
These results reveal substantial reasoning gaps beyond conventional static benchmarks.
We call on the community to further investigate the potential of TCS problems in FTP domain where our approach enables fully automated synthesis of arbitrarily many problems that are universally **easy to verify** yet systematically **hard to prove**.
Supplementary Material: zip
Submission Number: 78
Loading