Balancing Stability and Plasticity in Continual Learning: the readout-decomposition of activation change (RDAC) framework

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: continual learning, stability-plasticity trade-off, representational drift, task-incremental learning, readout misalignment, interpretability
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: The RDAC framework dissects the stability-plasticity trade-off in continual learning, revealing how existing algorithms balance it and offering a novel approach to maintaining stability without sacrificing plasticity.
Abstract: Continual learning (CL) algorithms strive to equip neural networks with the ability to acquire new knowledge while preserving prior information. However, the stability-plasticity trade-off remains a central challenge in CL. This paper introduces a framework that dissects this trade-off, offering valuable insights into CL algorithms. The framework first addresses the stability-plasticity dilemma and its relation to catastrophic forgetting. It presents the Readout-Decomposition of Activation Change (RDAC) framework that relates learning-induced activation changes in the range of prior readouts to the degree of stability, and changes in the null space to the degree of plasticity. In deep non-linear networks tackling split-CIFAR-110 tasks, the framework was used to explain the stability-plasticity trade-offs of the popular regularization algorithms Synaptic intelligence (SI), Elastic-weight consolidation (EWC), and learning without Forgetting (LwF) and replay based algorithms Gradient episodic memory (GEM), and data replay. GEM and data replay excelled in preserving both stability and plasticity, while SI, EWC, and LwF traded off plasticity for stability. The inability of the regularization algorithms to maintain plasticity was linked to them restricting the change of activations in the null space of the prior readout. For one-hidden-layer linear neural networks, we additionally derived a gradient decomposition algorithm to restrict activation change only in the range of the prior readouts, to maintain high stability while not further sacrificing plasticity. Results demonstrate that the algorithm maintains stability without significant plasticity loss. The RDAC framework not only informs the behavior of existing CL algorithms but also paves the way for novel CL approaches. Finally, it sheds light on the connection between learning-induced activation/representation changes and the stability-plasticity dilemma, also offering insights into representational drift in biological systems.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7950
Loading