Abstract: Fully convolutional networks (FCNs) have shown extraordinary performance in salient object detection (SOD). However, when faced with complex and variable salient objects in terms of types and sizes, FCNs-based methods may still generate some under-segmentation saliency maps, such as inaccurate or incomplete object information, which is mainly caused by sub-optimal multi-scale context features and inadequate interaction of complementary information. In this paper, we devote to exploring an effective structure to capture contextual information and interact with complementary information for an accurate SOD task. Specifically, we first design a multi-source contextual cue extraction (MCCE) module to effectively capture context information with different receptive fields, and further aggregate the information to improve the expressive ability of the initial input features. Furthermore, we build a dual-residual adjacent feature interaction (DAFI) module to increase the exchange of high-level semantic and low-level space details information from multi-level features for better saliency prediction. Finally, extensive experimental results convincingly demonstrate that our method achieves more favorably against 15 state-of-the-art methods on five public SOD datasets.
Loading