Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning

ACL ARR 2024 April Submission449 Authors

16 Apr 2024 (modified: 19 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Despite remarkable performance of large language models (LLMs), when continually fine-tuning them on complex and diverse tasks, their performance on historical tasks decreases dramatically, known as the catastrophic forgetting problem. Existing works explored strategies like memory replay, regularization and parameter isolation, but little analysis were conducted over the optimization behavior of LLMs' continual fine-tuning. In this work, we investigate the geometric connections of different minima along the continual LLM fine-tuning trajectories, and discover the existence of low-loss valleys connecting minima of different target tasks (known as mode connectivity). We validate this phenomenon on LLMs and propose a new method called \textbf{I}nterpolation-based \textbf{LoRA} (I-LoRA). I-LoRA can strike a balance between plasticity (learning of new information) and stability (preservation of historical knowledge) through parameter interpolation, which constructs a dual-memory experience replay framework based on LoRA. Experiments on eight domain-specific benchmarks demonstrate that I-LoRA consistently shows significant improvement over previous approaches with up to $11\%$ performance gains. Our code is available at \url{https://anonymous.4open.science/r/LLMCL-3823}.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: continual learning, transfer learning, fine-tuning
Contribution Types: Model analysis & interpretability
Languages Studied: English, Chinese
Submission Number: 449
Loading