Scene-Aware Background Decoupling via Collaborative Fusion for Video Salient Object Detection

Shuyao Wang, Tingwei Liu, Yongri Piao, Miao Zhang

Published: 2025, Last Modified: 25 Jul 2025IEEE Signal Process. Lett. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Video Salient Object Detection (VSOD) faces significant challenges due to complex background disturbances in video sequences. Although notable progress has been made in this field, further efforts are needed to better address background interference. To address these challenges, our scene-aware background decoupling network (SBDNet) equipped with a scene-aware background decoupling strategy (SBDS) as well as collaborative fusion decoders (CFD). The SBDS includes a Dynamic Background Disentanglement Module (DBD) designed to effectively eliminate background distractions in real-world scenes. The DBD achieves this by utilizing a background mask to refine foreground elements and extracting semantic-enhanced context weights. The CFD enhances the fusion between CNN and Transformer decoders, ensuring high accuracy in saliency detection for video sequences. Extensive results demonstrate that our SBDNet significantly outperforms 14 state-of-the-art methods on four widely used benchmark datasets.