Inter-Batch Cross-Attention: See More to Forget Less

Abstract: Our paper presents a simple training strategy to help prevent catastrophic forgetting in continual learners, named Inter-Batch Cross-Attention (IBCA). We discover that adding an IBCA module at the input level can significantly increase the model's continual learning performance, with minimum memory and performance overhead. Our method makes minimum changes to existing transformer-based model architectures and can be used in parallel with other continual learning strategies. We demonstrate its effectiveness on class-incremental classification tasks on the 20 Newsgroups dataset.
