Soup to go: mitigating forgetting during continual learning with model averaging

Anat Kleiman; Gintare Karolina Dziugaite; Jonathan Frankle; Sham M. Kakade; Mansheej Paul

Soup to go: mitigating forgetting during continual learning with model averaging

Anat Kleiman, Gintare Karolina Dziugaite, Jonathan Frankle, Sham M. Kakade, Mansheej Paul

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Continual learning, model merging, fine-tuning

Abstract: In continual learning with pretrained large language models (LLMs), where data from instruction fine-tuning (IFT) tasks arrives in a sequence, fine-tuning on later tasks will often lead to performance degradation on earlier tasks. This is especially pronounced when the IFT tasks come from diverse domains. In this setting, how can we mitigate catastrophic forgetting of earlier tasks and retain what the LLM has learned? Inspired by a classical continual learning method, L2 penalty to previous weights, we propose Sequential Fine-tuning with Averaging (SFA), a method that merges models with earlier checkpoints trained on previous tasks during the course of training. SOTA approaches typically maintain a data buffer of past tasks or impose a penalty at each gradient step. However, our method achieves comparable results without the need to store past data, or multiple copies of parameters for each gradient step. Furthermore, our method outperforms penalty methods like L2 regression and EWC, as well as other common merging techniques such as Task Arithmetic, and TIES Merging. Finally, we show that using our method, a single model can simultaneously perform well on a range of fine-tuning tasks in diverse domains, including Math, Law and Code.

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11350

Loading