LoRA Without Forgetting: Freezing and Sparse Masking for Low-Rank Adaptation

Published: 05 Mar 2025, Last Modified: 11 Apr 2025SLLMEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 4 pages)
Keywords: Large Language Models, Parameter-Efficient Fine-Tuning, Sparsity, Catastrophic Forgetting
Abstract: Existing parameter-efficient fine-tuning (PEFT) methods for large language models (LLMs), such as LoRA, alleviate the computational burden but still introduce redundant trainable parameters and remain susceptible to knowledge degradation when fine-tuned sequentially. In this work, we propose LoRA without Forgetting (LoRAF), a novel PEFT method that reduces trainable parameters while mitigating catastrophic forgetting. LoRAF achieves this by freezing the low-rank matrix $A$ and applying sparse, task-specific masks to the low-rank matrix $B$. To prevent interference between tasks, LoRAF enforces non-overlapping masks across different tasks. We evaluate LoRAF on natural language understanding and mathematical reasoning tasks using Mistral-7B. Our results demonstrate that LoRAF outperforms full fine-tuning (FFT) and LoRA while using 95\% fewer trainable parameters than LoRA. In a sequential learning setting, LoRAF significantly outperforms both LoRA and FFT in mitigating catastrophic forgetting.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 58
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview