Cross-Model Nested Fusion Network for Salient Object Detection in Optical Remote Sensing Images

Published: 2025, Last Modified: 15 Jan 2026IEEE Trans. Cybern. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recently, salient object detection (SOD) in optical remote sensing images, dubbed ORSI-SOD, has attracted increasing research interest. Although deep-based models have achieved impressive performance, several limitations remain: a single image contains multiple objects with varying scales, complex topological structures, and background interference. These unresolved issues render ORSI-SOD a challenging task. To address these challenges, we introduce a distinctive cross-model nested fusion network (CMNFNet), which leverages heterogeneous features to increase the performance of ORSI-SOD. Specifically, the proposed model comprises two heterogeneous encoders, a conventional CNN-based encoder that can model local features, and a specially designed graph convolutional network (GCN)-based encoder with local and global receptive fields that can model local and global features simultaneously. To effectively differentiate between multiple salient objects of different sizes or complex topological structures within an image, we project the image into two different graphs with different receptive fields and conduct message passing through two parallel graph convolutions. Finally, the heterogeneous features extracted from the two encoders are fused in the well-designed attention enhanced cross model nested fusion module (AECMNFM). This module is meticulously crafted to integrate features progressively, allowing the model to adaptively eliminate background interference while simultaneously refining the feature representations. We conducted comprehensive experimental analyzes on benchmark datasets. The results demonstrate the superiority of our CMNFNet over 16 state-of-the-art (SOTA) models.
Loading