VF-Mamba: See What Matters First for Remote Sensing Semantic Segmentation

Jitao Ma; Qingchen Zeng; Weiying Xie; Ruofan Jia; Daixun Li; Yunsong Li; Leyuan Fang

VF-Mamba: See What Matters First for Remote Sensing Semantic Segmentation

Jitao Ma, Qingchen Zeng, Weiying Xie, Ruofan Jia, Daixun Li, Yunsong Li, Leyuan Fang

16 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Semantic segmentation, Mamba Model, Attention Mechanism, Remote Sensing

TL;DR: we propose Visual Focus Mamba (VF-Mamba), which introduces a Saliency-Driven Scanning (SDS) mechanism to guide adaptive, content-aware processing.

Abstract: While Mamba models offer efficient global modeling, fixed scanning limits adaptation in complex scenes. This limitation is critical for remote sensing segmentation, where fine-grained understanding of rare objects is required. In this paper, we propose Visual Focus Mamba (VF-Mamba), a saliency-guided state space model that scans informative regions first. Our VF-Mamba is inspired by the phenomenon that the human visual system prioritizes salient regions for progressive scene interpretation. Based on this principle, we develop Saliency-Driven Scanning (SDS), which ranks patches according to their distributional distinctiveness. The underlying foundation of our approach is that samples exhibiting substantial distributional disparities are to be classified as either sparse categories or regions of interest, thus prioritizing scanning them can improve segmentation performance. For the purposes of enhancing robustness and efficiency, respectively, two strategies are hereby proposed: Full-SDS employs pairwise Wasserstein distances for precise measurement, while Sparse-SDS utilises a Gaussian reference for efficient approximation. The efficacy of VF-Mamba is evaluated using two benchmark datasets. The results demonstrate higher accuracy and better preservation of fine details in small classes compared to state-of-the-art methods, achieved a 3.28\% IoU improvement on challenging categories.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 6685

Loading