Scaling graph-based test time compute for automated theorem proving

ACL ARR 2025 February Submission8515 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models have demonstrated remarkable capabilities in natural language processing tasks requiring multi-step logical reasoning capabilities, such as automated theorem proving. However, challenges persist within automated theorem proving such as the identification of key mathematical concepts, understanding their interrelationships, and formalizing proofs within a rigorous framework. We present a novel framework that leverages knowledge graphs to augment LLMs to construct and formalize mathematical proofs. Furthermore, we study the effects of scaling test-time compute within our framework. Our results demonstrate significant performance improvements across multiple datasets, with using knowledge graphs, achieving up to a 34% success rate on the MUSTARDSAUCE dataset on o1-mini and consistently outperforming baseline approaches by 2-11% across different models. We show how this approach bridges the gap between natural language understanding and formal logic proof systems and achieves elevated results for foundation models over baseline.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: Large Language Models (LLMs), mathematical reasoning, knowledge graphs, proof formalization, natural language understanding, logical reasoning, theorem proving, AI-augmented mathematics
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 8515
Loading