Dual-Stage Gradient Projection Based Continual Learning: Enhancing Plasticity and Preserving Stability
Keywords: Continual Learning; Gradient Projection; Catastrophic Forgetting; Stability–Plasticity Trade-off; Curvature Information
TL;DR: We propose a loss-sensitive, two-stage gradient projection for continual learning: it steers updates toward directions nearly orthogonal to the protected subspace, reducing direction and magnitude distortion from the final projection.
Abstract: In continual learning, gradient projection algorithms avoid forgetting by projecting the gradient onto the orthogonal complement of the feature space of previous tasks, thereby ensuring the model’s stability. However, strict orthogonal projection can cause the projected gradient to deviate sharply from the original gradient, damaging the model’s learning ability to new tasks and reducing its plasticity. Gradient-projection methods that relax the orthogonality constraint alleviate the deviation introduced by strict projection, yet the degree of gradient distortion remains large and the model’s plasticity still needs improvement. To address such an issue, we propose a continual-learning method based on two-stage gradient projection that improves the model’s plasticity for new tasks while preserving its stability on previous tasks. Specifically, in the first stage, we design a loss-sensitive space (LSS) regularization term (soft regularization) on top of the cross-entropy loss to constrain the gradient to update as closely as possible along directions orthogonal to the feature space of previous tasks, thereby maintaining plasticity. In the second stage, a scaled projection (hard projection) further constrains the gradient to update along directions approximately orthogonal to the feature space of previous tasks, thus ensuring stability. Experimental results on three benchmark image classification datasets demonstrate that our method, for the first time, reduces the gap between the achieved classification accuracy and the task-specific upper bound (multitask) to within roughly 2\%, indicating that the model possesses both strong plasticity and stability.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 8814
Loading