Dual-Stage Gradient Projection Based Continual Learning: Enhancing Plasticity and Preserving Stability

Haotong Wen; Yi Xu; Haoyu Luo; Xiao Liu; Peng Zhou

Dual-Stage Gradient Projection Based Continual Learning: Enhancing Plasticity and Preserving Stability

Haotong Wen, Yi Xu, Haoyu Luo, Xiao Liu, Peng Zhou

17 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Continual Learning; Gradient Projection; Catastrophic Forgetting; Stability–Plasticity Trade-off; Curvature Information

TL;DR: We propose a loss-sensitive, two-stage gradient projection for continual learning: it steers updates toward directions nearly orthogonal to the protected subspace, reducing direction and magnitude distortion from the final projection.

Abstract: In continual learning, gradient projection algorithms avoid forgetting by projecting the gradient onto the orthogonal complement of the feature space of previous tasks, thereby ensuring the model’s stability. However, strict orthogonal projection can cause the projected gradient to deviate sharply from the original gradient, damaging the model’s learning ability to new tasks and reducing its plasticity. Gradient-projection methods that relax the orthogonality constraint alleviate the deviation introduced by strict projection, yet the degree of gradient distortion remains large and the model’s plasticity still needs improvement. To address such an issue, we propose a continual-learning method based on two-stage gradient projection that improves the model’s plasticity for new tasks while preserving its stability on previous tasks. Specifically, in the first stage, we design a loss-sensitive space (LSS) regularization term (soft regularization) on top of the cross-entropy loss to constrain the gradient to update as closely as possible along directions orthogonal to the feature space of previous tasks, thereby maintaining plasticity. In the second stage, a scaled projection (hard projection) further constrains the gradient to update along directions approximately orthogonal to the feature space of previous tasks, thus ensuring stability. Experimental results on three benchmark image classification datasets demonstrate that our method, for the first time, reduces the gap between the achieved classification accuracy and the task-specific upper bound (multitask) to within roughly 2\%, indicating that the model possesses both strong plasticity and stability.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 8814

Loading