Hvclip: High-dimensional vector in clip for unsupervised domain adaptation

Published: 29 Sept 2024, Last Modified: 03 Mar 2026ECCV 2024EveryoneRevisionsCC BY-SA 4.0
Abstract: Recent advancements in large-scale image-text pre-training models, such as CLIP, have significantly improved unsupervised domain adaptation (UDA) by leveraging pre-trained knowledge to bridge the source and target domain gap. However, catastrophic forgetting remains a major challenge, as traditional fine-tuning methods that adjust CLIP model weights on a target domain can quickly override its pre-trained knowledge. To address this issue, we propose converting CLIP’s features into a high-dimensional vector (hypervector) space to exploit the robustness properties of hypervectors. We first study the feature dimension size in hypervector space to empirically identify a dimension threshold that allows sufficient feature redundancy to avoid excessive training, thereby mitigating catastrophic forgetting. To further leverage the robustness of hypervectors, we propose Discrepancy Reduction to reduce the domain shift between source and target domains, and Feature Augmentation to synthesize labeled target domain features from source domain features. Our method achieves state-of-the-art results on four public UDA datasets, demonstrates generalization to other applications such as few-shot learning and continual learning, and exhibits model-agnostic properties across both vision-language and vision-only backbones.
Loading