Towards a robust experimental framework and benchmark for lifelong language learning

Aman Hussain; Nithin Holla; Pushkar Mishra; Helen Yannakoudakis; Ekaterina Shutova

Towards a robust experimental framework and benchmark for lifelong language learning

Aman Hussain, Nithin Holla, Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova

Published: 29 Jul 2021, Last Modified: 24 May 2023NeurIPS 2021 Datasets and Benchmarks Track (Round 1)Readers: Everyone

Keywords: lifelong learning, continual learning, natural language processing

Abstract: In lifelong learning, a model learns different tasks sequentially throughout its lifetime. State-of-the-art deep learning models, however, struggle to generalize in this setting and suffer from catastrophic forgetting of old tasks when learning new ones. While a number of approaches have been developed in an attempt to ameliorate this problem, there are no established, unified or generalized frameworks for rigorous evaluations of proposed solutions; a problem which is particularly pronounced in the domain of NLP. The few existing benchmarks are typically limited to a specific flavor of lifelong learning -- continual open-set classification -- where new classes, as opposed to tasks, are learned incrementally. Moreover, the only general lifelong learning benchmark combines a multi-label classification setup with a multi-class classification setup resulting in misleading gradients during training. We empirically demonstrate that the catastrophic forgetting observed here can be attributed to the experimental design rather than to any inherent modeling limitations. To address these issues, we propose an experimental framework for true, general lifelong learning in NLP. Using this framework, we develop a comprehensive suite of benchmarks that target different properties of lifelong learning (e.g., forgetting or intransigence); experiment with diverse facets of language learning: multi-domain, multilingual and different levels of linguistic hierarchy; and present a continuous evaluation scheme under a new metric: Area Under the Lifelong Test Curve. Our framework reveals shortcomings of prevalent memory-based solutions, demonstrating they are unable to outperform a simple experience replay baseline under the realistic lifelong learning setup.

URL: https://amanhussain.com/lifelong-learning/

TL;DR: To address critical flaws & limitations in the contemporary experiments, we design an experimental framework and develop a suite of benchmarks for lifelong learing in NLP.

Supplementary Material: zip

8 Replies

Loading