On structured sparsity and dual lottery tickets for Robust Continual Multi-task Learning

On structured sparsity and dual lottery tickets for Robust Continual Multi-task Learning

ICLR 2026 Conference Submission19863 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Continual learning, Expander graphs, Multitask learning, Catastrophic forgetting, Sparse adaptation

TL;DR: We introduce a new method for continual LLM learning combining expanders for structured sparsity and EWC to prevent catastrophic forgetting. This approach, a multi-task extension of the DLTH, allows improved training without catastrophic forgetting

Abstract: Continual learning for large language models (LLMs) faces a critical challenge: adapting to new tasks often results in catastrophic forgetting of prior knowledge and destructive interference across tasks. While sparse adaptation methods, such as Lottery Ticket Adaptation (LoTA), have emerged to mitigate these issues by optimizing only sparse subnetworks, they often rely on data-dependent mask calibration or random pruning. LoTA, for instance, identifies sparse subnetworks to avoid destructive interference and enables model merging, demonstrating improved performance over full fine-tuning (FFT) and low-rank adaptation (LoRA) in multi-task scenarios. Its extension, LoTTO, further enhances sequential training by learning mutually sparse masks to prevent overlap between tasks. Building upon these insights, our work introduces a novel approach for robust continual multi-task adaptation, specifically designed to achieve high accuracy on two or more tasks (A and B) without catastrophic forgetting. Our technique distinguishes itself by first selecting subnetworks based on inherent structural properties using expander graph masks, rather than relying on data-dependent or purely random selection. These expander masks provide a principled and structurally sound basis for defining initial sparse subnetworks. Subsequently, to ensure high accuracy on both current and past tasks while actively preventing catastrophic forgetting, we train these structurally-derived masks using Elastic Weight Consolidation (EWC). EWC selectively regularizes the parameters deemed important for previously learned tasks, thereby preserving critical knowledge and enabling efficient adaptation to new objectives. This combined methodology not only yields demonstrably higher scores across multiple tasks but also offers a compelling multi-task extension of the Dual Lottery Ticket Hypothesis (DLTH). In this context, we claim that any two random expander masks can be transformed into highly trainable subnetworks, achieving high degrees of accuracy on distinct tasks. Our approach provides a powerful and efficient framework for robust continual learning in LLMs, addressing the core challenges of destructive interference and catastrophic forgetting through structured sparsity and intelligent knowledge preservation.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 19863

Loading