OSClip: Domain-Adaptive Prompt Tuning of Vision-Language Models for Open-Set Remote Sensing Image Classification

Dingkang Peng, Xiaokang Zhang, Wanjing Wu, Xianping Ma, Weikang Yu

Published: 01 Jan 2025, Last Modified: 07 Nov 2025IEEE Journal of Selected Topics in Applied Earth Observations and Remote SensingEveryoneRevisionsCC BY-SA 4.0
Abstract: Remote sensing image classification models face significant challenges when adapting to new domains due to variations in image acquisition conditions, sensor types, and scene categories. Conventional domain adaptation methods rely on multistage adaptation pipelines with limited semantic understanding, and even recently developed vision-language models (VLMs) still exhibit limited discriminative capability when encountering unseen images. To tackle these challenges, we propose OSClip, a novel open-set domain adaptation framework based on the VLM model, contrastive language-image pre-training (CLIP). Specifically, OSClip harnesses the powerful generalization capabilities of CLIP by employing domain-adaptive prompt tuning, which inserts lightweight, learnable prompts into both the vision and language encoders. This design enables efficient adaptation to new, unlabeled target domains while retaining knowledge acquired during pretraining. Furthermore, a robust open-set recognition mechanism is incorporated by combining confidence-weighted pseudolabel supervision and energy-based regularization, further strengthened by a teacher–student self-distillation scheme to enhance pseudolabel reliability under unsupervised conditions. To support adaptation across multiple target domains while mitigating catastrophic forgetting, OSClip adopts a continual adaptation paradigm for the blended test set. It dynamically aggregates prompts based on the distribution of domain-specific features to ensure stable knowledge transfer. Extensive experiments on public remote sensing datasets demonstrate that OSClip consistently outperforms state-of-the-art methods, delivering superior accuracy in distinguishing known and unknown classes across various adaptation scenarios. The results also confirm the effectiveness of OSClip in achieving robust cross-modal and cross-domain semantic alignment.
Loading