TinyGSM: achieving 80% on GSM8k with one billion parameters

Bingbin Liu; Sebastien Bubeck; Ronen Eldan; Janardhan Kulkarni; Yuanzhi Li; Anh Nguyen; Rachel Ward; Yi Zhang

TinyGSM: achieving 80% on GSM8k with one billion parameters

Bingbin Liu, Sebastien Bubeck, Ronen Eldan, Janardhan Kulkarni, Yuanzhi Li, Anh Nguyen, Rachel Ward, Yi Zhang

Published: 28 Oct 2023, Last Modified: 28 Oct 2023MATH-AI 23 PosterEveryoneRevisionsBibTeX

Keywords: GSM8K, math word problem, reasoning, small language models, distillation, verifier

TL;DR: We train a 1.3B model to achieve 80.1% on GSM8K.

Abstract: Small models offer various computational advantages, yet the extent to which size is critical for problem-solving abilities remains an open question. This work studies the performance of small models on mathematical reasoning. Specifically, for solving math word problems, we find that a 1.3B model can achieve 80.1% accuracy on GSM8K, outperforming existing models that are orders of magnitude larger, and even rivaling the performance of the GPT-3.5-turbo teacher model from which the training data is generated. Our approach is simple and has two key components: The first is the use of a GPT-3.5-turbo-generated synthetic dataset of math word problem with solutions, which we will fully release. The second component is the use of a verifier, which selects the final outputs from multiple candidate generations.

Submission Number: 56

Loading