Keywords: Continual Learning, Domain-adaptation, Catastrophic Forgetting, Pretraining
TL;DR: We introduce a new method called Context-aware Continual Pretraining to mitigate catastrophic when continually pretraining Large Language Models.
Abstract: Retraining large language models (LLMs) from scratch to include novel, internal or domain-specific knowledge is prohibitively computationally expensive. Therefore, practitioners rely on continual pretraining to adapt existing pretrained models to new data. As the model's parameters are updated to assimilate new information, it can abruptly lose proficiency on previously learned domains, a phenomenon known as catastrophic forgetting. To address this issue, we propose Context-aware Continual Pretraining (CA-CPT), a simple technique that provides the model with sample-specific context before adapting its weights to new content in order to smoothen the training loss. Our empirical results demonstrate that CA-CPT has comparable or superior performance on new domain data while consistently mitigating the forgetting of both general knowledge and specialized instruction-following abilities. We show that our method is broadly applicable, is orthogonal to existing catastrophic forgetting mitigation strategies, and can serve as a building block for more robust continually learning language models.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 7998
Loading