Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training

Yanlai Yang; Matt Jones; Michael Curtis Mozer; Mengye Ren

Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training

Yanlai Yang, Matt Jones, Michael Curtis Mozer, Mengye Ren

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models, LLMs, catastrophic interference, online learning, continual learning, anticipatory recovery, cyclic training, structured training sequences

TL;DR: When we fine-tune LLMs cyclically in a fixed repeated sequence of documents the model can recover from catastrophic interference before seeing the same document again.

Abstract: We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs finetuned sequentially in this setting: they exhibit *anticipatory* behavior, recovering from the forgetting on documents *before* seeing them again. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we uncover new insights into training over-parameterized networks in structured environments.

Supplementary Material: zip

Primary Area: Online learning

Submission Number: 5768

Loading