Gated-MCBAM: Cross-Modal Block Attention Module with Gating Mechanism for Remote Sensing Segmentation

Published: 01 Jan 2024, Last Modified: 16 Oct 2025WHISPERS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Semantic segmentation in remote sensing often relies on multiple modalities for improving the performance of deep learning models. However, efficient fusion of complementary information from different modalities remains challenging, with most existing approaches using simple weight-based fusion or channel concat methods. In this paper, we propose a Gated Multi Cross-modal Block Attention Module (Gated-MCBAM), a novel cross-modal attention mechanism that effectively models the interactions between Multispectral Image (MSI) and Synthetic Aperture Radar (SAR) features in different scales. Our approach extends the traditional convolutional block attention module to handle multi-modal features by introducing cross-modal interactions, enabling bi-directional feature refinement where each modality enhances its representation using complementary information from others. Also, our proposed method employs a gate mechanism to control feature significance. We demonstrate quantitatively and qualitatively that our proposed method improves performance with several experiments on the WHISPERS 2024 MMSeg-YREB dataset.
Loading