TinyGSM: achieving 80% on GSM8k with one billion parameters
Keywords: GSM8K, math word problem, reasoning, small language models, distillation, verifier
TL;DR: We train a 1.3B model to achieve 80.1% on GSM8K.
Abstract: Small models offer various computational advantages, yet the extent to which size is critical for problem-solving abilities remains an open question. This work studies the performance of small models on mathematical reasoning. Specifically, for solving math word problems, we find that a 1.3B model can achieve 80.1% accuracy on GSM8K, outperforming existing models that are orders of magnitude larger, and even rivaling the performance of the GPT-3.5-turbo teacher model from which the training data is generated. Our approach is simple and has two key components: The first is the use of a GPT-3.5-turbo-generated synthetic dataset of math word problem with solutions, which we will fully release. The second component is the use of a verifier, which selects the final outputs from multiple candidate generations.
Submission Number: 56
Loading