LSFMamba: Local-enhanced Spiral Fusion Mamba for Multi-modal Land Cover Classification

Honghao Chang, Haixia Bi, Chen Xu, Fan Li

Published: 01 Jan 2026, Last Modified: 27 Jan 2026IEEE Transactions on Circuits and Systems for Video TechnologyEveryoneRevisionsCC BY-SA 4.0

Abstract: Multi-modal learning, which fuses complementary information from different modalities, has significantly improved the accuracy of land cover classification, especially under adverse conditions like cloudy or rainy weather. Recent advancements in multi-modal remote sensing land cover classification (MMRLC) have witnessed the efficacy of approaches based on CNN and Transformer. However, CNN exhibits limitations in capturing long-range dependencies, whereas Transformer suffers from high computational complexity. Recently, Mamba has garnered widespread attention due to its superior long-range modeling capabilities with linear complexity. Nevertheless, Mamba demonstrates notable limitations when directly applied to MMRLC, including limited local contextual modeling capacity, suboptimal multi-modal feature fusion and lack of a task-specific spatial continuity scanning strategy. Hence, to fully explore the potential of Mamba in multi-modal land cover classification, we propose LSFMamba, which comprises multiple hierarchically connected local-enhanced fusion Mamba (LFM) modules. Within each LFM module, a local-enhanced visual state space (LVSS) block is designed to extract features from different modalities, while a cross-modal interaction state space (CISS) block is created to fuse these multi-modal features. In the LVSS block, we integrate a multi-kernel CNN block into the gating branch in Mamba to enhance its local modeling capabilities. In the CISS block, features from different modalities are interleaved, facilitating cross-modal feature interaction through the state space model. Furthermore, we introduce a novel spiral scanning strategy to reassess the significance of central pixels, a design driven by the unique characteristics of pixel-wise classification task. Extensive experimental results on three multi-modal remote sensing datasets demonstrate that the proposed LSFMamba achieves state-of-the-art performance with lower complexity. The code will be released at https://github.com/hhchhang78/LSFMamba.

External IDs:doi:10.1109/tcsvt.2026.3651397