Robust Salient Object Detection in Optical Remote Sensing Images via Multiscale Contextual Attention and Feature Enhancement

Published: 2025, Last Modified: 06 Nov 2025IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Salient object detection in optical remote sensing images (ORSI-SOD) is challenged by complex scene structures, diverse spatial distributions, and imaging-induced issues such as shadow interference and low contrast. However, existing methods often focus heavily on adjacent-level contextual aggregation, while overlooking the enhancement of intralevel features and the effective integration of multilevel information. This leads to limited semantic discrimination and weak preservation of structural details. To address these issues, we propose a robust multiscale contextual attention and feature enhancement network (RoCAFE-Net). For each level, we introduce a parallel dual-branch structure comprising the spatial displacement self-attention module (SDSM) and the channel and spatial detail perception module (CSDPM). SDSM enhances object localization by explicitly modeling spatial arrangements and positional diversity, while CSDPM focuses on suppressing noise and preserving fine structural details through complementary channel and spatial modeling. These two branches collaboratively refine features before fusion. Although both modules refine features at individual levels, direct cross-level fusion may introduce redundant information and semantic inconsistency. To mitigate this, we design the adaptive feature fusion head, which dynamically integrates multilevel features by modeling interchannel dependencies and enhancing spatial attention. Extensive experiments on three challenging ORSI-SOD datasets demonstrate that RoCAFE-Net achieves superior performance compared to state-of-the-art methods.
Loading