On Understanding of the Dynamics of Model Capacity in Continual Learning

Supriyo Chakraborty; Krishnan Raghavan

On Understanding of the Dynamics of Model Capacity in Continual Learning

Supriyo Chakraborty, Krishnan Raghavan

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Continual Learning, dynamic pogramming, theory, CL challenges.

TL;DR: Characterizing effective model capacity in continual learning and formalizing its impact on the balance between forgetting and learning.

Abstract: The core issue in continual learning (CL) is balancing catastrophic forgetting of prior knowledge with generalization to new tasks, otherwise, known as the stability-plasticity dilemma. We argue that the dilemma is akin to the capacity~(the networks' ability to represent tasks) of the neural network~(NN) in the CL setting. Within this context, this work introduces ``CL’s effective model capacity (CLEMC)" to understand the dynamical behavior of stability-plasticity balance point in the CL setting. We define CLEMC as a function of the NN, the task data, and the optimization procedure. Leveraging CLEMC, we demonstrate that the capacity is non-stationary and regardless of the NN architecture and optimization method, the network’s ability to represent new tasks diminishes if the incoming tasks’ data distributions differ from previous ones. We formulate these results using dynamical systems' theory and conduct extensive experiments to complement the findings. Our analysis extends from a small feed-forward~(FNN) and convolutional networks~(CNN) to medium sized graph neural networks~(GNN) to transformer-based large language models~(LLM) with millions of parameters.

Supplementary Material: pdf

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10939

Loading