Abstract: Existing leading remote sensing change detection (RSCD) often takes a semantic-agnostic learning paradigm, which uses a binary ground-truth mask as supervision for model training. Despite the demonstrated success, due to the intrinsic characteristic of extremely complicated scene changes in RS images, this paradigm is prone to be misled by irrelevant semantic category changes, leading to a noisy CD mask prediction. To address this issue, this paper presents a Spatio-Semantic Prompt (SSP) guided adaptive Segment Anything Model (SAM) for RSCD, dubbed as SSP-SAM. The SSP-SAM introduces sparse textual and dense mask prompts into SAM to encode the task-specific semantic knowledge for RSCD. Specifically, we first encode the powerful textual semantic knowledge using Contrastive Language-Image Pre-training (CLIP) to determine the desired change semantic category. Then, we design a spatial dense prompt module that yields an attention map as prompt features to further refine the desired changed regions. Subsequently, we fine-tune the SAM through an adaptor to integrate the spatial-semantic prompt cues, yielding a coarse CD mask prediction. Finally, guided by the coarse CD mask, a multi-scale mask attention mechanism is adopted to learn the refined semantic representations of the changed targets, predicting the accurate CD mask. Extensive experiments on a variety of benchmark datasets demonstrate that the proposed SSP-SAM achieves state-of-the-art performance.
Loading