Gradient Projection Memory for Continual LearningDownload PDF

28 Sep 2020 (modified: 25 Jan 2021)ICLR 2021 OralReaders: Everyone
  • Keywords: Continual Learning, Representation Learning, Computer Vision, Deep learning
  • Abstract: The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems. Existing approaches to enable such learning in artificial neural networks usually rely on network growth, importance based weight update or replay of old data from the memory. In contrast, we propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for past tasks. We find the bases of these subspaces by analyzing network representations (activations) after learning each task with Singular Value Decomposition (SVD) in a single shot manner and store them in the memory as Gradient Projection Memory (GPM). With qualitative and quantitative analyses, we show that such orthogonal gradient decent induces minimum to no interference with the past tasks, thereby mitigating forgetting. We evaluate our algorithm on diverse image classification datasets with short and long sequences of tasks and report better or on-par performance compared to the state-of-the-art approaches.
  • Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
  • One-sentence Summary: To avoid catastrophic forgetting in continual learning, we propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for past tasks.
15 Replies

Loading