HMAFNet: Hybrid Mamba-Attention Fusion Network for Remote Sensing Image Semantic Segmentation

Published: 01 Jan 2025, Last Modified: 13 May 2025IEEE Geosci. Remote. Sens. Lett. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Remote sensing (RS) images have rich ground information, diverse object types, and large-scale differences, and these characteristics make difficulties in achieving precise segmentation. Recently, the state-space model improved by Mamba offers global modeling capability while maintaining linear computational complexity. However, it still faces issues with insufficient extraction of global information specific to the spatial and channel dimensions, which is crucial for achieving accurate segmentation, along with lacking sensitivity to local details. Based on this, we propose a hybrid Mamba-attention fusion network (HMAFNet) for RS image semantic segmentation, based on the encoder-decoder architecture. Specifically, the encoder incorporates the spatial-channel Mamba (SCMamba) module, which uses the Mamba to efficiently capture global feature representations across both spatial and channel dimensions. Meanwhile, local information essential for the encoding phase is supplemented by a parallel convolutional branch. In the decoding phase, we propose the information-guided cross fusion (IGCF) module, which generates corresponding features via convolution-based and Mamba-based information-guided branches. The cross-attention mechanism facilitates the interaction and fusion between the features, thereby preserving elaborated details and further eliminating semantic differences. Extensive comparison experiments and ablation experiments on both the Vaihingen and Potsdam datasets show that our proposed HAMFNet can achieve better segmentation results.
Loading