Keywords: Large Language Model; Math Reasoning; Data augmentation; Scaling relationship; Generalizability
TL;DR: This paper analyzes the scaling relationship and generalization of data augmentation in mathematical reasoning with large language models.
Abstract: In math reasoning with large language models (LLMs), fine-tuning data augmentation by query evolution and diverse reasoning paths is empirically verified effective, profoundly narrowing the gap between open-sourced LLMs and cutting-edge proprietary LLMs.
In this paper, we conduct an investigation for such data augmentation in math reasoning and are intended to answer:
(1) What strategies of data augmentation are more effective;
(2) What is the scaling relationship between the amount of augmented data and model performance; and
(3) Can data augmentation incentivize generalization to out-of-domain mathematical reasoning tasks?
To this end, we create a new dataset, AugGSM8K, by complicating and diversifying the queries from GSM8K and sampling multiple reasoning paths.
We obtained a series of LLMs called MuggleMath by fine-tuning on subsets of AugGSM8K. MuggleMath substantially achieves new state-of-the-art on GSM8K (from 54\% to 68.4\% at the scale of 7B, and from 63.9\% to 74.0\% at the scale of 13B).
A log-linear relationship is presented between MuggleMath’s performance and the amount of augmented data.
We also find that MuggleMath is weak in out-of-domain math reasoning generalization to MATH.
This is attributed to the differences in query distribution between AugGSM8K and MATH which suggest that augmentation on a single benchmark could not help with overall math reasoning performance.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4983
Loading