Elastic Weight Consolidation for Reduction of Catastrophic Forgetting in GPT-2Download PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Neural networks are naturally prone to the effects of catastrophic forgetting during fine-tuning. Despite the extensive adoption of transformers, little research has been done to investigate the effects of catastrophic forgetting on attention-based architectures. In this work, we used elastic weight consolidation (EWC) to mitigate catastrophic forgetting caused by fine-tuning in one of the foundation models, GPT-2. We show that by using EWC, we can significantly slow down the forgetting process without major penalty for the performance of the task model is fine-tuned for. We also determine that the majority of important weights is located in self-attention layers, and the parameters most sensitive to change are located in the normalization layers. Finally, we explore the instability of the EWC and potential performance issues.
Paper Type: short
0 Replies

Loading