A Diff-Attention Aware State-Space Fusion Model for Remote Sensing Classification

Wenping Ma, Boyou Xue, Mengru Ma, Chuang Chen, Hekai Zhang, Hao Zhu

Published: 2025, Last Modified: 25 Mar 2026IEEE Trans. Geosci. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multispectral (MS) and panchromatic (PAN) images describe the same land surface, so these images not only have their own advantages but also share a significant amount of redundant information. In order to separate similar information and each modality’s unique advantages, thereby reducing feature redundancy at the fusion stage, this article introduces a diff-attention-aware state-space fusion model (DASF-Model) for multimodal remote sensing (RS) image classification. Based on the selective state-space model (SSM), a cross-modal diff-attention module (CDAM) is designed to extract and separate the common features and their respective dominant features of MS and PAN images. Specifically, space preserving Visual Mamba (SPVM) retains image spatial features and captures local features by appropriately optimizing Visual Mamba’s input. Considering that features in the fusion stage will have large semantic differences after feature separation and the traditional mean fusion method fails to effectively integrate these features with significant discrepancies, an attention-aware linear fusion module (ALFM) is proposed. It performs pixelwise linear fusion by calculating influence coefficients. This mechanism can fuse features with large semantic differences while keeping the feature size unchanged. Empirical evaluations indicate that the presented method achieves better results than alternative approaches. The relevant code can be found at: https://github.com/AVKSKVL/DAS-F-Model
Loading