DSP-EntCD: A Knowledge-Freezing, Entropy-Guided Remote Sensing Change Detection Network with Domain-Specific Pretraining
Keywords: Remote Sensing; Change Detection;Domain-Specific Pretraining;Foreground-Background Imbalance; Multi-Scale
TL;DR: We propose DSP-EntCD, a two-stage framework that enhances remote sensing change detection by freezing domain knowledge and guiding attention with entropy to improve foreground-background imbalance and multi-scale target perception.
Abstract: Remote sensing change detection refers to identifying surface changes in very-high-resolution images acquired over the same geographic area at different times. It serves as a core technology in natural resource supervision and intelligent urban management. However, in most real-world scenarios, the changed regions occupy only a small portion of the image, causing existing methods to be biased toward background detection. In addition, change detection faces the challenge of spatio-temporal multi-scale heterogeneity, where change targets exhibit significant scale variations across temporal sequences and spatial dimensions, increasing the difficulty of feature modeling. To address these issues, we propose a knowledge-freezing two-stage training framework, termed Domain-Specific Pretraining and Entropy-Guided Change Detection (DSP-EntCD). First, we introduce a prior-driven training strategy called Domain-Specific Pretraining (DSP), which enhances the backbone’s sensitivity to foreground information. Second, we propose an Entropy-Guided Attention Selection Mechanism (EGASM) to estimate the uncertainty of spatial locations and alleviate fusion bias between the dual-branch encoders. Furthermore, we present a Semantic-Guided Cascaded Decoder (SGCD) that integrates high-level semantics, spatial awareness, and low-level details in a complementary manner, aiming to enhance perception of multi-scale change regions and improve detection accuracy across targets of varying sizes. Experiments conducted on three datasets with severe foreground-background imbalance demonstrate that our method achieves state-of-the-art performance.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 10968
Loading