Abstract: Video Salient Object Detection (VSOD) faces significant challenges due to complex background disturbances in video sequences. Although notable progress has been made in this field, further efforts are needed to better address background interference. To address these challenges, our scene-aware background decoupling network (SBDNet) equipped with a scene-aware background decoupling strategy (SBDS) as well as collaborative fusion decoders (CFD). The SBDS includes a Dynamic Background Disentanglement Module (DBD) designed to effectively eliminate background distractions in real-world scenes. The DBD achieves this by utilizing a background mask to refine foreground elements and extracting semantic-enhanced context weights. The CFD enhances the fusion between CNN and Transformer decoders, ensuring high accuracy in saliency detection for video sequences. Extensive results demonstrate that our SBDNet significantly outperforms 14 state-of-the-art methods on four widely used benchmark datasets.
Loading