Domain-Conditioned Transformer for Fully Test-time Adaptation

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Fully test-time adaptation aims to adapt a network model online based on sequential analysis of input samples during the inference stage. We observe that, when applying a transformer network model into a new domain, the self-attention profiles of image samples in the target domain deviate significantly from those in the source domain, which results in large performance degradation during domain changes. To address this important issue, we propose a new structure for the self-attention modules in the transformer. Specifically, we incorporate three domain-conditioning vectors, called domain conditioners, into the query, key, and value components of the self-attention module. We learn a network to generate these three domain conditioners from the class token at each transformer network layer. We find that, during fully online test-time adaptation, these domain conditioners at each transform network layer are able to gradually remove the impact of domain shift and largely recover the original self-attention profile. Our extensive experimental results demonstrate that the proposed domain-conditioned transformer significantly improves the online fully test-time domain adaptation performance and outperforms existing state-of-the-art methods by large margins.
Primary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: Our submission addresses a critical challenge in the field of multimedia computing, specifically in the context of online test-time adaptation, where we aim to enhance the image classification performance of Vision Transformers (ViT) across diverse domains during the online testing stage. By proposing a domain-conditioned transformer architecture that dynamically adapts to changes in the input during inference, our work represents a significant advancement in online test-time adaptation methods. This innovation has significant implications for multimedia applications, including image and video processing, where robustness to domain shifts is crucial for ensuring reliable performance in real-world scenarios.
Supplementary Material: zip
Submission Number: 1474
Loading