Turning the Tables: Enabling Backward Transfer via Causal-Aware LoRA in Continual Learning

Chaoyang Li; Runze Ye; Jianyang Qin; Jinhao Cui; Lingzhi Wang; Ning Hu; Qing Liao

Turning the Tables: Enabling Backward Transfer via Causal-Aware LoRA in Continual Learning

Chaoyang Li, Runze Ye, Jianyang Qin, Jinhao Cui, Lingzhi Wang, Ning Hu, Qing Liao

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Continual Learning; parameter-efficient fine-tuning; Causal Learning

Abstract: Current parameter-efficient fine-tuning (PEFT) methods have shown superior performance in continual learning. However, most existing PEFT-based methods focus on mitigating catastrophic forgetting by limiting modifications to the old task model caused by new tasks. This hinders backward knowledge transfer, as when new tasks have a strong positive correlation with old tasks, appropriately training on new tasks can transfer beneficial knowledge to old tasks. Critically, achieving backward knowledge transfer faces two fundamental challenges: (1) some parameters may be ineffective on task performance, which constrains the task solution space and model capacity; (2) since old task data are inaccessible, modeling task correlation via shared data is infeasible. To address these challenges, we propose CaLoRA, a novel \textbf{c}ausal-\textbf{a}ware \textbf{lo}w-\textbf{r}ank \textbf{a}daptation framework that is the first PEFT-based continual learning work with backward knowledge transfer. Specifically, we first propose \textbf{p}ar\textbf{a}meter-level \textbf{c}ounterfactual \textbf{a}ttribution (PaCA) that estimates the causal effect of LoRA parameters via counterfactual reasoning, identifying effective parameters from a causal view. Second, we propose \textbf{c}ross-t\textbf{a}sk \textbf{g}radient \textbf{a}daptation (CaGA) to quantify task correlation by gradient projection and evaluate task affinity based on gradient similarity. By incorporating causal effect, task correlation, and affinity, CaGA adaptively adjusts task gradients, facilitating backward knowledge transfer without relying on data replay. Extensive experiments across multiple benchmarks and continual learning settings show that CaLoRA outperforms state-of-the-art methods. In particular, CaLoRA better mitigates catastrophic forgetting by enabling positive backward knowledge transfer.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 16688

Loading