Provable Forgetting Bounds Drive Capacity Savings: Spectral Thresholding in Continual LoRA

Provable Forgetting Bounds Drive Capacity Savings: Spectral Thresholding in Continual LoRA

05 May 2026 (modified: 09 May 2026)ICML 2026 Workshop CoLoRAI SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Continual Learning, Theoretical Guarantees, Low-Rank Adaptation, Orthogonal Subspace

TL;DR: For orthogonal continual LoRA, residual spectral tails control past-task interference; minimizing this bound yields a single-threshold rank rule that matches fixed-rank InfLoRA on LLaMA-2 7B / TRACE with ~31% fewer parameters.

Abstract: Orthogonal-subspace LoRA methods mitigate forgetting in continual learning of foundation models by assigning each task a new adapter subspace, but existing approaches typically allocate a fixed, hand-picked rank to every task and layer. This ignores a key source of heterogeneity: the amount of rank needed to avoid interference depends on the residual spectrum of the task's layerwise activations. We introduce **DyRA**, a theory-guided dynamic rank allocation rule that selects adapter bases by retaining residual singular directions whose singular value exceeds a global threshold $\tau$. Our method is motivated by a per-layer interference bound showing that, for orthogonal LoRA, future-task interference on a past task is controlled by the spectral tail left outside the past-task adapter basis. Thus, **DyRA** replaces a fixed-rank design choice with a spectral-tail control principle. On continual instruction tuning of the LLaMA-2 7B foundation model with TRACE, **DyRA** yields a favorable performance–capacity trade-off, with the clearest gains at low rank ($+4.7$ AP over fixed $r=4$) and matches fixed $r=8$ with about $31\%$ fewer adapter parameters. **DyRA** requires only one residual-spectrum computation per task and layer, and can be added to existing orthogonal-subspace LoRA methods.

Submission Number: 33

Loading