TL;DR: This paper mainly focuses on understanding the benefits of ensemble in supervised fine-tuning.
Abstract: Supervised fine-tuning (SFT) on domain-specific data is the dominant approach for adapting foundation models to specialized tasks. However, it has been observed that SFT models tend to forget knowledge acquired during pretraining. In vision models, ensembling a pretrained model with its fine-tuned counterpart has been shown to mitigate this issue. In this work, we demonstrate that the same holds for language models, and, more strikingly, we observe an overadaptation phenomenon: the ensemble model not only retains general knowledge from the foundation model but also outperforms the fine-tuned model even on the fine-tuning domain itself.
Despite the empirical success of ensembling, a theoretical understanding of its benefits remains underexplored. We develop a formal theoretical analysis of the overadaptation phenomenon.
Ensembling mitigates this by balancing two primary sources of error: bias, caused by insufficient fine-tuning, and variance, introduced by overfitting to fine-tuning data. While regularization techniques aim to address this trade-off, we show that ensembling provides a more effective solution. We analyze this phenomenon in over-parameterized linear settings and demonstrate that interpolating between pretrained and fine-tuned weights significantly improves performance. These findings offer theoretical justification for the observed advantages of model ensembling, supported by empirical experiments consistent with our analysis.
Lay Summary: Foundation models, like large language models, are often adapted to new tasks using supervised fine-tuning (SFT). However, this fine-tuning can cause the model to "forget" useful general knowledge learned during pretraining. In vision, combining the original and fine-tuned models—called ensembling—helps retain that knowledge. We show that ensembling has similar benefits for language models and, surprisingly, can even outperform the fine-tuned model on its own task. To understand why, we study a phenomenon called overadaptation, where fine-tuning goes too far, leading to high variance in risks. We provide a theoretical explanation for the benefits of ensemble by analyzing the bias–variance trade-off: fine-tuned models suffer from high variance due to overfitting, while pretrained models have high bias due to underfitting the new domain. Ensembling strikes a balance between these extremes. Our analysis in over-parameterized linear settings shows that interpolating between pretrained and fine-tuned weights can significantly improve performance. We support our theory with experiments, showing that interpolating between pretrained and fine-tuned models leads to better performance. This work helps explain why ensembling works so well and offers guidance for more robust model adaptation.
Link To Code: https://github.com/xypan0/LLMForgetting
Primary Area: Theory->Domain Adaptation and Transfer Learning
Keywords: model ensemble, fine-tuning, multi-task
Submission Number: 13280
Loading