Keywords: Large Language Models, Low-Rank Adaptation, Catastrophic Forgetting
Abstract: In recent years, low-rank adaptation (LoRA) has emerged as a significant paradigm, which freezes the pre-trained weights and introduces small, learnable adapters instead of fine-tuning the full set of parameters. In this work, we uncover several key insights regarding to the $\textit{singular}$ components of the network parameters based on Singular Value Decomposition (SVD). Firstly, the dominant singular components with large singular values in pre-trained network parameters can be effectively reused during fine-tuning, whereas the fine-grained components with smaller singular values are more task-specific and require substantial adaptation. Secondly, the growth of singular values in the LoRA adapter leads to the forgetting of pre-trained knowledge $-$ a well-known issue called $\textit{catastrophic forgetting}$. Building upon these observations, we propose $\textbf{FCLoRA}$, which injects learnable fine-grained singular components to the pre-trained model. By employing parameterized SVD and restricting the singular values to an appropriate range, $\textbf{FCLoRA}$ can effectively adapt to new tasks by learning in the fine-grained singular domain and alleviates the catastrophic forgetting problem. We conduct extensive experiments and demonstrate that $\textbf{FCLoRA}$ not only improves performance but also effectively retains pre-trained knowledge.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 14904
Loading