Keywords: large language models, LLMs, catastrophic interference, online learning, continual learning, anticipatory recovery, cyclic training, structured training sequences
TL;DR: When we fine-tune LLMs cyclically in a fixed repeated sequence of documents the model can recover from catastrophic interference before seeing the same document again.
Abstract: We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs finetuned sequentially in this setting: they exhibit *anticipatory* behavior, recovering from the forgetting on documents *before* seeing them again. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we uncover new insights into training over-parameterized networks in structured environments.
Supplementary Material: zip
Primary Area: Online learning
Submission Number: 5768
Loading