Learning to Reason in Large Theories without Imitation

Kshitij Bansal; Christian Szegedy; Markus Norman Rabe; Sarah M. Loos; Viktor Toman

Learning to Reason in Large Theories without Imitation

Kshitij Bansal, Christian Szegedy, Markus Norman Rabe, Sarah M. Loos, Viktor Toman

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: reinforcement learning, thoerem proving, exploration, mathematics

Abstract: In this paper, we demonstrate how to do automated higher-order logic theorem proving in the presence of a large knowledge base of potential premises without learning from human proofs. We augment the exploration of premises based on a simple tf-idf (term frequency-inverse document frequency) based lookup in a deep reinforcement learning scenario. Our experiments show that our theorem prover trained with this exploration mechanism but no human proofs, dubbed DeepHOL Zero, outperforms provers that are trained only on human proofs. It approaches the performance of a prover trained by a combination of imitation and reinforcement learning. We perform multiple experiments to understand the importance of the underlying assumptions that make our exploration approach work, thus explaining our design choices.

One-sentence Summary: Demonstrate that it is possible to learn a premise selection model for theorem proving in the absence of human proofs.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=-Bbz5KG1Ap

20 Replies

Loading