DiRA: Nuclear Norm Dynamic Rank Adaptation for Large Language Models

16 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Fine-Tuning, Nuclear-Norm
TL;DR: We introduce a new PEFT method DiRA, which not only improves model performance but also reveals changes in the rank landscape associated with catastrophic forgetting.
Abstract: Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), have become a standard paradigm for adapting Large Language Models (LLMs) to specific tasks. However, standard LoRA implementations use a fixed, uniform adaptation rank across all layers, a static allocation that fails to capture the varying contributions of different layers. In this work, we introduce DiRA, which learns layer-adaptive ranks by penalizing the nuclear norm of the weight update matrix $\Delta W$ for each layer. While extensive experiments show that DiRA matches or surpasses fixed-rank LoRA baselines across tasks, its primary contribution is methodological and scientific. Using DiRA as a probe, we uncover a mechanism of catastrophic forgetting in continual learning: forgetting is frequently accompanied by pronounced changes in the rank landscape. Building on this insight, we propose a new strategy that treats the previously learned rank landscape as a prior and, with only a small amount of data, regularizes current updates to retain newly acquired knowledge while recovering old-task memory, thereby mitigating forgetting. Taken together, these results position DiRA both as an efficient PEFT method and as a principled approach for understanding—and mitigating—forgetting in LLMs.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 7607
Loading