Keywords: Continual Learning, Representation Drift, Global Prototypes, Adaptation Model, Self-Supervised Learning
Abstract: Continual learning aims to sequentially learn from different tasks without catastrophic forgetting. With no assumptions of task dependence, the knowledge learned from observed tasks may not align with that required for future tasks. This may result in models' disruptive updates for learning future tasks, causing abrupt changes to previously learned knowledge (e.g. representation drift) which induces catastrophic forgetting. To reduce such disruptive updates, we connect knowledge for observed and unknown tasks by learning task data representations properly related to a set of global prototypes, which have general-purpose connections and are shared across all tasks. We derive global prototypes and the corresponding objective for NLP tasks. For those tasks, the correlated global prototypes can be obtained from a model pre-trained by masked language modeling. And the data representations that have proper relationships to global prototypes can be learned by specific adaptations of the pre-trained model. We investigate existing adaptation models and propose a neighbor attention model which combines different advantages of existing models for our objective. Experiments show that models learning data representations well related to global prototypes can induce significantly less catastrophic forgetting, without memorizing information from past tasks.
Supplementary Material: pdf
Submission Number: 258
Loading