Collaborative Frequency-Aware Transformer for Unsupervised Multimodal Change Detection in Heterogeneous Remote Sensing Images

Published: 2025, Last Modified: 03 Feb 2026IEEE Trans. Geosci. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multimodal change detection (MCD), as an emerging task, aims at recognizing change regions from bi-temporal remote sensing images (RSIs) of different modalities. Inspired by the success of the self-attention mechanism in the Transformer, attempts have been made to solve MCD through the Transformer variants. However, Transformer-based network optimization requires high-quality training samples. In addition, due to the significant differences in the data distribution, semantic information, and feature representation of multimodal data, Transformer-based methods have obvious deficiencies in local feature representation and spatial consistency, especially when dealing with heterogeneous images. To address the above challenges, we propose a collaborative frequency-aware Transformer for MCD (CFAT-MCD). As an unsupervised framework, CFAT-MCD is capable of learning more fine-grained patterns of land-cover change through a few pseudo-labels. The CFAT is designed to enhance spatial consistency and align the features on a multiscale (MS) basis, which can effectively mitigate the effects of modal differences. In addition, we propose a window-based spatial–frequency collaborative representation (SFCR) module to introduce frequency information into the spatial domain and improve the discriminability of spatial features. Extensive experiments on public datasets and quantitative analyses have validated the superior detection performance of our approach and the effectiveness of each module. The source code will be released at https://github.com/pu7yan9/CFAT_MCD.
Loading