Gated Class-Attention with Cascaded Feature Drift Compensation for Exemplar-free Continual Learning of Vision Transformers

Marco Cotogni; Fei Yang; Claudio Cusano; Andrew D. Bagdanov; Joost van de Weijer

Gated Class-Attention with Cascaded Feature Drift Compensation for Exemplar-free Continual Learning of Vision Transformers

Marco Cotogni, Fei Yang, Claudio Cusano, Andrew D. Bagdanov, Joost van de Weijer

22 Sept 2022 (modified: 26 May 2025)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Exemplar-Free Continual Learning, Vision Transformer, Class-incremental learning

Abstract: Vision transformers (ViTs) have achieved remarkable successes across a broad range of computer vision applications. As a consequence there has been increasing interest in extending continual learning theory and techniques to ViT architectures. In this paper, we propose a new method for exemplar-free class incremental training of ViTs. The main challenge of exemplar-free continual learning is maintaining plasticity of the learner without causing catastrophic forgetting of previously learned tasks. This is often achieved via exemplar replay which can help recalibrate previous task classifiers to the feature drift which occurs when learning new tasks. Exemplar replay, however, comes at the cost of retaining samples from previous tasks which for some applications may not be possible. To address the problem of continual ViT training, we first propose gated class-attention to minimize the drift in the final ViT transformer block. This mask-based gating is applied to class-attention mechanism of the last transformer block and strongly regulates the weights crucial for previous tasks. Secondly, we propose a new method of feature drift compensation that accommodates feature drift in the backbone when learning new tasks. The combination of gated class-attention and cascaded feature drift compensation allows for plasticity towards new tasks while limiting forgetting of previous ones. Extensive experiments performed on CIFAR-100 and Tiny-ImageNet demonstrate that our method outperforms existing exemplar-free state-of-the-art methods without the need to store any representative exemplars of past tasks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

TL;DR: We propose a gated-class attention mechanism with feature drift compensation that achieves improved plasticity and stability for exemplar-free continual learning of visual transformers

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/gated-class-attention-with-cascaded-feature/code)

6 Replies

Loading