Remote Sensing Image Change Detection Based on Lightweight Transformer and Multiscale Feature Fusion

Jingming Li, Panpan Zheng, Liejun Wang

Published: 2025, Last Modified: 06 Nov 2025IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As deep learning demonstrates excellent performance in remote sensing image change detection (CD), early methods that mostly used convolutional neural networks (CNNs) have limitations in the accuracy due to their insufficient global feature representation, an inherent shortcoming of CNNs. The lack of global feature can lead to notable issues, such as the inability to detect small targets and loss of edge information. In recent years, vision transformers have been employed in CD owing to their powerful global feature representation capabilities. However, pure transformer methods lack effective local feature extraction, which also restricts the performance of CD, while the original transformer models require a large amount of computing resources. To address these issues and improve CD performance, we propose a lightweight transformer-based multiscale feature fusion network. By integrating CNN structures both before and after the multihead self-attention in each layer of the main backbone, we enhance the encoder's local feature extraction ability and reduce the computational complexity through convolution and linear operations. For the siamese encoding outputs at different scales, we design two distinct fusion modules based on depthwise convolution for bitemporal information fusion in deep layers and shallow layers, respectively. Our model employs a multilayer cascaded structure with a deep supervision strategy applied to multiple outputs. Experiments on four public CD datasets demonstrate that our network achieves better performance while maintaining relatively smaller computational complexity compared to other state-of-the-art methods for CD.

External IDs:dblp:journals/staeors/LiZW25