Abstract: This study introduces a novel cross-modal spatial-spectral interaction Mamba (CMS2I-Mamba) for remote sensing image fusion classification. Unlike convolution-based models focusing on local details and Transformer-based models with high computational complexity, CMS2I-Mamba efficiently models global long-range dependencies in a linear complexity manner. First, multispectral (MS) and panchromatic (PAN) images each have unique advantages in the spectral and spatial attributes. Given this, this article innovatively designs the multipath selective-scan mechanism (MPS2M), which applies different path scanning strategies to deeply capture the global features from both spectral and spatial dimensions, aiming to enhance the robustness and complementarity of spatial-spectral features. Second, to overcome the characterization differences between images acquired by different sensors, this article further introduces the channel interaction alignment module (CIAM). This module employs efficient former-last and odd-even channel interaction strategies to achieve precise semantic alignment of deep features between modalities. Finally, to leverage the shared fusion features to guide the unique singular features, this article proposes a semantic-aware calibration module (SACM), which accurately constraints and calibrates the same semantic information in deep features. This not only enhances the model’s ability to understand scene semantics, but also promotes the deep fusion and utilization of information between different modalities. Through experimental verification on multiple datasets, the CMS2I-Mamba proposed in this article shows excellent recognition performance and computational efficiency (parameter quantity and running speed) in fusion classification tasks. The code for CMS2I-Mamba is available at: https://github.com/ru-willow/CMSI-Mamba.
Loading