Unveiling the Mystery of SFT’s Impact on Model Performance from Token Level and Parameter Level

ACL ARR 2025 February Submission1620 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Supervised fine-tuning (SFT) is a critical technique for adapting large language models (LLMs) to specific tasks using labeled data. However, in this paper, we present a counterintuitive finding that LLMs fine-tuned with 1,920 data points perform 14\% worse in the closed-book question answering (CBQA) task than those fine-tuned with only 240 data points. Additionally, fine-tuning with different subsets of 1,920 data points results in performance fluctuations exceeding 12%. To investigate these discrepancies, we analyze the models at both the token and parameter levels. Our analysis shows that up to 90% of the parameter updates introduced by SFT are redundant. In certain cases, these updates cause catastrophic forgetting, wiping out previously mastered knowledge and negatively affecting performance. Furthermore, the impact of these parameter changes is highly dependent on the specific fine-tuning dataset. By restoring the unnecessary parameter alterations, we reduce the distributional shift between the pretrained and fine-tuned models, achieving a 10% improvement in performance. These findings provide new insights into optimizing fine-tuning strategies for LLMs and mitigating performance degradation.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: contrastive explanations, data influence, natural language explanations
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Submission Number: 1620
Loading