Keywords: finetuning, alignment, forgetting, distillation
TL;DR: We design a method to mitigate forgetting during language model finetuning using a self-distillation loss function.
Abstract: Large language models (LLMs) pretrained on large corpora of internet text possess much of the world knowledge.
Following pretraining, one often needs to conduct continued pretraining on certain capabilities such as math and coding, or "posttraining" (a.k.a., alignment) techniques to make the models follow users' instructions and align them with human preferences.
One challenge during these finetuning stages is that the model can lose the pretraining knowledge or forget certain capabilities (e.g., in-context learning ability).
Moreover, although there exist strong open-weight LLMs such as Llama 3, both their pretraining and posttraining data are not open to the public, making it difficult to mix the finetuning data with the models' own pretraining data as a solution for mitigating forgetting.
We propose label annealing, a method that mitigates forgetting during finetuning without requiring access to the original pretraining data.
Label annealing distills pretraining knowledge during finetuing by adding a KL divergence term in the loss function, regularizing the divergence between the finetuned model's predictions to those of the initial pretrained model.
In mathematics and code finetuning, label annealing improves the model's performance in target domains without sacrificing other capabilities of the pretrained model.
In alignment finetuning, our method introduces a smooth tradeoff between the instruction-following capability and the pretraining knowledge.
We complement our empirical investigation with a mathematical model with overparameterized linear regression that provides geometric intuition why label annealing would help.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7459
Loading