Keywords: Continual Learning, Knowledge Accumulation, Scaling
TL;DR: We propose a framework SCoLe (Scaling Continual Learning) to study knowledge accumulation in continual learning with SGD training.
Abstract: Standard gradient descent algorithms applied to sequences of tasks are known to induce catastrophic forgetting in deep neural networks. When trained on a new task, the model's parameters are updated in a way that degrades performance on past tasks.
This article explores continual learning (CL) on long sequences of tasks sampled from a finite environment.
\textbf{We show that in this setting, learning with stochastic gradient descent (SGD) results in knowledge retention and accumulation without specific memorization mechanisms.} This is in contrast to the current notion of forgetting from the CL literature, which shows that training on new tasks with such an approach results in forgetting previous tasks, especially in class-incremental settings.
To study this phenomenon, we propose an experimental framework, \Scole{} (Scaling Continual Learning), which allows to generate arbitrarily long task sequences. Our experiments show that the previous results obtained on relatively short task sequences may not reveal certain phenomena that emerge in longer ones.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
16 Replies
Loading