SemiCD-VL: Visual-Language Model Guidance Makes Better Semi-Supervised Change Detector

Kaiyu Li, Xiangyong Cao, Yupeng Deng, Jiayi Song, Junmin Liu, Deyu Meng, Zhi Wang

Published: 01 Jan 2025, Last Modified: 13 Nov 2025IEEE Trans. Geosci. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Change detection (CD) aims to identify pixels with semantic changes between images. However, annotating massive numbers of pixel-level images is labor-intensive and costly, especially for multitemporal images, which require pixel-wise comparisons by human experts. Considering the excellent performance of visual-language models (VLMs) for zero-shot, OV, etc., with prompt-based reasoning, it is promising to utilize VLMs to make better CD under limited labeled data. In this article, we propose a VLM guidance-based semi-supervised CD method, namely SemiCD-VL. The insight of SemiCD-VL is to synthesize free change labels using VLMs to provide additional supervision signals for unlabeled data. However, almost all current VLMs are designed for single-temporal images and cannot be directly applied to bi- or multitemporal images. Motivated by this, we first propose a VLM-based mixed change event generation (CEG) strategy to yield pseudo-labels for unlabeled CD data. Since the additional supervised signals provided by these VLM-driven pseudo-labels may conflict with the original pseudo-labels from the consistency regularization paradigm (e.g., FixMatch), we propose the dual projection head for de-entangling different signal sources. Further, we explicitly decouple the bitemporal images semantic representation through two auxiliary segmentation decoders, which are also guided by VLM. Finally, to make the model more adequately capture change representations, we introduce contrastive consistency regularization (CCR) by constructing feature-level contrastive loss in auxiliary branches. Extensive experiments show the advantage of SemiCD-VL. For instance, SemiCD-VL improves the FixMatch baseline by $+ 5.3~\text {IoU}^{c}$ on WHU-CD and by $+ 2.4~\text {IoU}^{c}$ on LEVIR-CD with 5% labels, and SemiCD-VL requires only 5%–10% of the labels to achieve performance similar to the supervised methods. In addition, our CEG strategy, in an unsupervised manner, can achieve performance far superior to state-of-the-art (SOTA) unsupervised CD methods (e.g., IoU improved from 18.8% to 46.3% on LEVIR-CD dataset). The code is available at https://github.com/likyoo/SemiCD-VL.