NaturalProver: Grounded Mathematical Proof Generation with Language Models

Sean Welleck; Jiacheng Liu; Ximing Lu; Hannaneh Hajishirzi; Yejin Choi

NaturalProver: Grounded Mathematical Proof Generation with Language Models

Sean Welleck, Jiacheng Liu, Ximing Lu, Hannaneh Hajishirzi, Yejin Choi

Published: 31 Oct 2022, Last Modified: 06 Apr 2025NeurIPS 2022 AcceptReaders: Everyone

Keywords: language modeling, reasoning, neural theorem proving

Abstract: Theorem proving in natural mathematical language – the mixture of symbolic and natural language used by humans – plays a central role in mathematical advances and education, and tests aspects of reasoning that are core to intelligence. Yet it has remained underexplored with modern generative models. We study large-scale language models on two new generation tasks: suggesting the next step in a mathematical proof, and full proof generation. We develop NaturalProver, a language model that generates proofs by conditioning on background references (e.g. theorems and definitions that are either retrieved or human-provided), and optionally enforces their presence with constrained decoding. On theorems from the NaturalProofs benchmark, NaturalProver improves the quality of next-step suggestions and generated proofs over fine-tuned GPT-3, according to human evaluations from university-level mathematics students. NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time, which is to our knowledge the first demonstration of these capabilities using neural language models.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/naturalprover-grounded-mathematical-proof/code)

11 Replies

Loading