Effect of scale on catastrophic forgetting in neural networks

Vinay Venkatesh Ramasesh; Aitor Lewkowycz; Ethan Dyer

Effect of scale on catastrophic forgetting in neural networks

Vinay Venkatesh Ramasesh, Aitor Lewkowycz, Ethan Dyer

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 PosterReaders: Everyone

Keywords: Catastrophic forgetting, continual learning, scaling, language modeling, image classification

Abstract: Catastrophic forgetting presents a challenge in developing deep learning models capable of continual learning, i.e. learning tasks sequentially. Recently, both computer vision and natural-language processing have witnessed great progress through the use of large-scale pretrained models. In this work, we present an empirical study of catastrophic forgetting in this pretraining paradigm. Our experiments indicate that large, pretrained ResNets and Transformers are significantly more resistant to forgetting than randomly-initialized, trained-from-scratch models; this robustness systematically improves with scale of both model and pretraining dataset size. We take initial steps towards characterizing what aspect of model representations allows them to perform continual learning so well, finding that in the pretrained models, distinct class representations grow more orthogonal with scale. Our results suggest that, when possible, scale and a diverse pretraining dataset can be useful ingredients in mitigating catastrophic forgetting.

One-sentence Summary: We find that large, pre-trained models are robust to catastrophic forgetting.

6 Replies

Loading