Abstract: Differential features obtained by comparing the images of two periods play an indispensable role in the change detection (CD) task. However, a pair of bitemporal images can exhibit diverse changes, which may cause various differential features. Identifying changed pixel pairs with different differential features to be the same category is thus a challenge for CD. Most contemporary methodologies efficiently address CD through adequately modeling image representations. However, they often overlook fully exploiting the semantics of changes and the relationships among differential features across various pixel pairs. In this study, a novel change-specific semantic transformer (CSST) is proposed. It learns a change-specific semantic (CS) token from various changed pixel pairs directly within the differential feature space to capture the essence of “change.” Based on the guiding of this token, CSST effectively conducts differential feature refinement through its stacked cosine cross-attention layer and self-attention layer. The cosine cross-attention layer enables the semantic token to perceive which pixel pairs are changed and which are unchanged and share unified information with those changed ones. The self-attention layer further weighs the important pixel pairs through the relationship interaction of semantically enhanced differential features to facilitate the CD. In addition, CSST is compatible with various backbone networks and existing CD methods. Experiments on four CD datasets demonstrate that the proposed method achieves state-of-the-art (SOTA) performance and exhibits excellent generalization capabilities. Our code is publicly available at https://github.com/wdanc/CSST.
Loading