CPSR-CLIP: Conditional Prompt-Induced Style Reconstruction for Zero-Shot Domain Adaptation

Published: 01 Jan 2025, Last Modified: 12 Nov 2025IEEE Trans. Multim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Due to the requirement of target domain data in existing unsupervised domain adaptation (UDA) techniques, researchers have shifted their focus to a more practical and challenging scenario, i.e., zero-shot domain adaptation (ZSDA). However, ZSDA remains a significant challenge, with existing approaches in ZSDA often relying heavily on a carefully crafted and highly compatible auxiliary domain. This is impractical in real-world applications. To address the mentioned problems, we propose conditional prompt-induced style reconstruction with contrastive language-image pre-training (CPSR-CLIP), which leverages the rich semantic embedding of CLIP to synthesize target-like features, effectively bypassing the need for auxiliary dual-domain samples. CPSR-CLIP adopts a multi-phase optimization strategy and every optimization phase is a prerequisite for the next phase. Firstly, we propose dynamic prompt disentanglement to facilitate the model in differentiating the discrepancy between the source and target prompts, thus paving the way for conditional prompt-induced style reconstruction phase. This phase meticulously strips away domain-specific styles to reserve domain-invariant features and injects target style characteristics through target domain prompts. Finally, with the target-like features in hand, we adaptively adjust the learnable part of target prompts for further fitting. Extensive experiments have been conducted on several datasets and the results demonstrate the superiority of CPSR-CLIP over the state-of-the-art methods.
Loading