Abstract: The unique imaging conditions of satellites introduce significant uncertainties in the structure and scale of ground objects, presenting a major challenge for optical remote sensing image salient object detection (ORSI-SOD). Current ORSI-SOD methods often fail to effectively differentiate between salient objects and subtle background variations, leading to suboptimal prediction outcomes. Furthermore, ORSI-SOD is a dense pixel prediction task, and existing approaches frequently depend on pixel-level probabilities, which can result in overconfident and inaccurate predictions. To address these challenges, we reformulate the ORSI-SOD task as a mask-generation problem by introducing a novel paradigm and propose a diffusion model-based method for ORSI-SOD, termed ORSIDiff. Central to our approach is the design of a powerful denoising network that enhances the model’s refinement capabilities. This network leverages the strengths of both global and local modeling, improving the handling of salient object details and enabling a deeper understanding of the distinctions between salient objects and their surroundings. Additionally, we introduce a consistency assessment strategy (CAS) that aggregates multiple potential predictions during the denoising process, effectively mitigating the issue of overconfident point estimation. The extensive experimental results on two widely used ORSI-SOD datasets demonstrate that ORSIDiff achieves significant performance improvements over 20 state-of-the-art (SOTA) methods.
External IDs:dblp:journals/tgrs/HanSWSL25
Loading