Solving Quantitative Reasoning Problems with Language Models

Aitor Lewkowycz; Anders Johan Andreassen; David Dohan; Ethan Dyer; Henryk Michalewski; Vinay Venkatesh Ramasesh; Ambrose Slone; Cem Anil; Imanol Schlag; Theo Gutman-Solo; Yuhuai Wu; Behnam Neyshabur; Guy Gur-Ari; Vedant Misra

Solving Quantitative Reasoning Problems with Language Models

Aitor Lewkowycz, Anders Johan Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Venkatesh Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra

Published: 31 Oct 2022, Last Modified: 14 Oct 2022NeurIPS 2022 AcceptReaders: Everyone

Keywords: language models, quantitative reasoning, transformers, math and science word problems

Abstract: Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering questions at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves strong performance in a variety of evaluations, including state-of-the-art performance on the MATH dataset. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a quarter of them.

TL;DR: We train a large Transformer language model on mathematical data and achieve strong performance on quantitative reasoning tasks, including state of the art performance on the MATH dataset.

Supplementary Material: zip

15 Replies

Loading