CLIP-Enhanced Unsupervised Domain Adaptation with Consistency Regularization

Kuo Shi, Jie Lu, Zhen Fang, Guangquan Zhang

Published: 2024, Last Modified: 16 Jan 2026IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Unsupervised domain adaptation (UDA) employs labeled data from a source domain to train classifiers for an unlabeled target domain. We utilize Contrastive Language-Image Pre-training (CLIP) models to exploit textual information in labels, enabling simultaneous matching of textual and image features. However, adapting CLIP models for UDA tasks poses a significant challenge and necessitates further investigation. To this end, we introduce CLIP-Enhanced Unsupervised Domain Adaptation with Consistency Regularization, which employs consistency regularization for concurrent training of CLIP’s prompts and image adapters. Our approach, particularly under consistency regularization, incorporates data augmentation to enhance the model’s generalization capability. During training, we maintain consistent pseudo-labels for target domain data, regardless of whether weak or strong augmentation techniques are applied. This strategy improves our model’s robustness in adapting to various domains. Additionally, the integration of domain-specific prompts and image adapters in our model optimizes the learning of domain-related textual and image features. Experiments on real-world datasets substantiate the effectiveness of our proposed method. The outcomes illustrate its superior performance compared to existing techniques across multiple benchmarks.