Keywords: Fine-tuning, Large Language Models, Catastrophic Forgetting, Singular Value Decomposition
Abstract: In this paper, we address catastrophic forgetting in fine-tuning Large Language Models (LLMs), a process where LLMs lose knowledge and capabilities upon learning new information. Traditional solutions mostly rely on reusing old training data. Such methods are often limited by knowledge about previously used data and possibly limited access to it. In contrast to these approaches, we propose a new strategy focusing on the model's weight matrices. Using Singular Value Decomposition (SVD), we seek to identify and preserve key components within these matrices, particularly the highest magnitude directions, to preserve the most sensitive characteristics. Our approach thus uniquely focuses updates on the space spanned by lower-impact directions. This methodology efficiently mitigates catastrophic forgetting and does not require access to the original training data, offering a more practical solution for LLM fine-tuning applications as it is simpler and more training data efficient. We show the benefit of our approach by fine-tuning an LLM and reducing the performance drop on benchmark tasks induced by fine-tuning.
Submission Number: 97
Loading