Efficient co-salient object detection by integrating mask consensus and attention diversion

Yanliang Ge, Yurui Chen, Qiao Zhang, Junchao Ren, Hongbo Bi

Published: 01 Jan 2025, Last Modified: 25 Jul 2025Signal Image Video Process. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Co-salient object detection aims(CoSOD) to identify common, distinctive objects across a set of images. These objects possess shared features that make them stand out against the background and other objects. This paper introduces a novel co-salient object detection model by integrating mask consensus and attention diversion (MCAD), which leverages multi-dimensional global context semantic information to accurately segment common salient objects in multiple images. Initially, in the feature encoding stage, we encode the initial features of a set of related images through an encoder to reduce dimensionality and extract texture and other features. Then, the Mask Consistency Awareness Module (MCAM) extracts global semantic information and group collaboration cues to generate co-saliency prototypes. These prototypes are sent to the Dual-dimensional Fused Enhancement Module (DFEM) for feature enhancement and noise suppression. To enhance the discriminative representation between different salient objects and balance inter-channel relationships and location information, we propose a split pooling and refusion strategy within the DFEM. We evaluated our model on three challenging CoSOD benchmark datasets using four widely accepted metrics and compared it with eight state-of-the-art methods. Experimental results demonstrate that MCAD outperforms existing cutting-edge methods for co-salient object detection. Our code and saliency maps are available at https://github.com/ChenYurui616/MCADNet.