Llemma: An Open Language Model For Mathematics

Published: 28 Oct 2023, Last Modified: 28 Oct 2023MATH-AI 23 PosterEveryoneRevisionsBibTeX
Keywords: language models, pretraining
TL;DR: An Open Language Model For Mathematics
Abstract: We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known openly released models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.
Submission Number: 45