Regularizing Trajectories to Mitigate Catastrophic Forgetting

Paul Michel; Elisabeth Salesky; Graham Neubig

Regularizing Trajectories to Mitigate Catastrophic Forgetting

Paul Michel, Elisabeth Salesky, Graham Neubig

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: Regularizing the optimization trajectory with the Fisher information of old tasks reduces catastrophic forgetting greatly

Abstract: Regularization-based continual learning approaches generally prevent catastrophic forgetting by augmenting the training loss with an auxiliary objective. However in most practical optimization scenarios with noisy data and/or gradients, it is possible that stochastic gradient descent can inadvertently change critical parameters. In this paper, we argue for the importance of regularizing optimization trajectories directly. We derive a new co-natural gradient update rule for continual learning whereby the new task gradients are preconditioned with the empirical Fisher information of previously learnt tasks. We show that using the co-natural gradient systematically reduces forgetting in continual learning. Moreover, it helps combat overfitting when learning a new task in a low resource scenario.

Keywords: Continual Learning, Regularization, Adaptation, Natural Gradient

Original Pdf: pdf

10 Replies

Loading