Spaced Scheduling Enhances Instruction-Prompted Reasoning in Large Language Models

Amine El hattami; Nicolas Chapados; Christopher Pal

Spaced Scheduling Enhances Instruction-Prompted Reasoning in Large Language Models

Amine El hattami, Nicolas Chapados, Christopher Pal

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: large language models, instruction tuning, reasoning, curriculum learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: The recent popularity of large language models has been fueled in part by advances in instruction tuning, which has helped unlock new levels of zero-shot model performance. Much of the prior work in this area has focused on creating new datasets to improve or add specific skills (e.g., improving reasoning via chain-of-thought prompting), or improving existing data sets by increasing the diversity of tasks and prompting templates. However, recent work has shown that instruction tuning can sometimes lead to performance degradation, and recent work has sought to overcome this issue by creating better dataset mixes (or collections) involving laborious and careful ablation studies to find the right composition. In this work, we propose a novel adaptive scheduling strategy we call spaced scheduling motivated by the spaced repetition learning method used by humans that creates an optimal curriculum (or schedule) of training examples. Our approach aims to perform the data mix selection process online during training, tailoring the training data composition to the chosen pre-trained model, reducing the need for extensive studies over different compositions of training data. Our results show that Spaced Scheduling yields better performance than random sampling and comparable results in the worst case, using less training data and minimizing catastrophic forgetting. Further, our proposed approach also yields more \textit{balanced} performance across all subcategories of the tested benchmarks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3809

Loading