Continual Learning in Deep Networks: an Analysis of the Last Layer

Timothee LESORT; Thomas George; Irina Rish

Continual Learning in Deep Networks: an Analysis of the Last Layer

Timothee LESORT, Thomas George, Irina Rish

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SubmittedReaders: Everyone

Keywords: Continual Learning, Linear Models

Abstract: We study how different output layers in a deep neural network learn and forget in continual learning settings. The following three factors can affect catastrophic forgetting in the output layer: (1) weights modifications, (2) interference, and (3) projection drift. In this paper, our goal is to provide more insights into how changing the output layers may address (1) and (2). Some potential solutions to those issues are proposed and evaluated here in several continual learning scenarios. We show that the best-performing type of the output layer depends on the data distribution drifts and/or the amount of data available. In particular, in some cases where a standard linear layer would fail, it turns out that changing parameterization is sufficient in order to achieve a significantly better performance, whithout introducing a continual-learning algorithm and instead using the standard SGD to train a model. Our analysis and results shed light on the dynamics of the output layer in continual learning scenarios, and suggest a way of selecting the best type of output layer for a given scenario.

One-sentence Summary: We analyze training of the output layer in various continual learning scenarios with a fixed feature extractor.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/continual-learning-in-deep-networks-an/code)

10 Replies

Loading