Optimal Transport-Based Prompt Alignment for Unsupervised Domain Adaptation

Published: 2025, Last Modified: 27 Jan 2026ICIC (3) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Unsupervised Domain Adaptation (UDA) aims to adapt a model trained on a labeled source domain to an unlabeled target domain. In recent years, vision-language models (VLMs) have emerged as powerful tools, achieving remarkable performance on various downstream tasks. However, when applied to UDA, these models often struggle to effectively learn domain-invariant features. In this paper, we propose an Optimal Transport-Based Prompt Alignment (OTPA) method for UDA to achieve fine-grained prompt alignment and learn domain-invariant features. OTPA leverages CLIP's zero-shot inference capabilities and the K-means algorithm to construct codebooks for both source and target domains, followed by a two-stage alignment process. In the first stage, we perform token-level Optimal Transport (OT) alignment between image features and textual prompts to establish a foundational performance. The second stage involves cross-attention between image features and domain-specific codebooks, followed by prompt-level alignment of the enhanced image features with textual features. This two-level OT alignment enables us to capture more fine-grained feature representations and learn domain-invariant features. Extensive experiments demonstrate that OTPA outperforms existing prompt learning methods in UDA tasks across various benchmarks.
Loading