PutnamBench: A Multilingual Competition-Mathematics Benchmark for Formal Theorem-Proving

Published: 13 Jun 2024, Last Modified: 03 Jul 2024ICML 2024 Workshop AI4MATH OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: theorem proving, automated mathematical reasoning, AI for MATH, Lean 4, Coq, Isabelle
TL;DR: We present PutnamBench, a benchmark of 1337 formalizations of Putnam competition problems in Lean 4, Isabelle, and Coq.
Abstract: We present PutnamBench, a new multilingual evaluation benchmark for formal theorem-proving. PutnamBench consists of formalizations of problems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the problem statements come with formalizations in Lean 4 and Isabelle; a substantial subset have Coq formalizations as well. PutnamBench consists of 1337 hand-written formalizations across the three proof assistants, and aims to benchmark the next generation of theorem-proving algorithms for competition mathematics. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We evaluate several established neural and symbolic theorem provers using PutnamBench. These approaches can only solve a handful of the problems, establishing our benchmark as a difficult open challenge for research on formal theorem-proving. PutnamBench is available at https://github.com/trishullab/PUTNAM.
Submission Number: 18
Loading