Improving Language Plasticity via Pretraining with Active Forgetting

Yihong Chen; Kelly Marchisio; Roberta Raileanu; David Ifeoluwa Adelani; Pontus Stenetorp; Sebastian Riedel; Mikel Artetxe

Improving Language Plasticity via Pretraining with Active Forgetting

Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: plasticity, continual learning, meta-learning, embeddings, cross-lingual transfer, forgetting

TL;DR: Pretraining a language model with active forgetting imbues it with more plasticity and makes it adapt to new languages faster in low-data scenarios.

Abstract: Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data and compute inefficient. We propose to use an active forgetting mechanism during pretraining, as a simple way of creating PLMs that can quickly adapt to new languages. Concretely, by resetting the embedding layer every K updates during pretraining, we encourage the PLM to improve its ability of learning new embeddings within limited number of updates, similar to a meta-learning effect. Experiments with RoBERTa show that models pretrained with our forgetting mechanism not only demonstrate faster convergence during language adaptation, but also outperform standard ones in a low-data regime, particularly for languages that are distant from English. Code will be available at https://github.com/facebookresearch/language-model-plasticity.

Supplementary Material: pdf

Submission Number: 9064

Loading