Keywords: Continual Learning, Lifelong Language Learning, Adapter Transformer, Natural Language Processing
TL;DR: Transformer with Adapter Modules that sequentially learns new NLP tasks in various domains and prevents catastrophic forgetting without retraining the model from scratch
Abstract: Continual Learning is important for real-world natural language processing applications,
where computational systems are required to interact with continuous streams of tasks
and language over time. When forced to adapt to new tasks and inputs, language models
experience catastrophic forgetting. The current generative replay-based algorithms are not
scalable to many tasks, and their performance may degrade from a change in the task order.
In this paper, we propose a model based on network growth - a pre-trained Transformer
with Adapter modules for each task - that sequentially learns new NLP tasks in various
domains and prevents catastrophic forgetting without retraining the model from scratch.
We train and maintain light weight adapter modules sequentially for each task. Without
increasing network growth by more than 15% and avoiding replay and task order bias,
the current design allows us to increase average task accuracy by 1.3% over the baseline
models.
1 Reply
Loading