Abstract: Unsupervised Domain Adaptation (UDA) seeks to utilize the knowledge acquired from a source domain, abundant in labeled data, and apply it to a target domain that contains only unlabeled data. The majority of existing UDA research focuses on learning domain-invariant feature representations for both domains by minimizing the domain gap using convolution-based neural networks. Recently, vision transformers have made significant strides in enhancing performance across various visual tasks. In this paper, we introduce a Bidirectional Cross-Attention Transformer (BCAT) for UDA, which is built upon vision transformers with the goal of improving performance. The proposed BCAT employs an attention mechanism to extract implicit source and target mixup feature representations, thereby reducing the domain discrepancy. More specifically, BCAT is designed as a weight-sharing quadruple-branch transformer with a bidirectional cross-attention mechanism, allowing it to learn domain-invariant feature representations. Comprehensive experiments indicate that our proposed BCAT model outperforms existing state-of-the-art UDA methods, both convolution-based and transformer-based, on four benchmark datasets.
0 Replies
Loading