Edge and semantic collaboration framework with cross coordination attention for co-saliency detection

Published: 2025, Last Modified: 08 Apr 2026Knowl. Based Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Abstract—Co-salient object detection (CoSOD) aims to identify salient objects that appear concurrently in a set of related images. The existing CoSOD methods emphasize mining consensus clues, while tend to overlook the implicit edge and semantic information expressed at various levels of image features. To address this limitation, we propose a novel edge and semantic collaboration framework (ESCF). ESCF employs high-level features to align semantic information from highly activated salient regions, while utilizing low-level features to refine edge contours for improved clarity. To eliminate distracting elements within images, we introduce a cross coordination attention module (CCAM) that calibrates channel weights in each parallel branch using global information. By leveraging interactive learning across various feature layers, the model improves its capacity to capture position sensitive details, effectively minimizing noise in the process. Furthermore, we design a consensus learning module (CLM) that focuses on spatial and channel information across multi-scale features in parallel, allowing it to extract consensus clues from the generated multivariate attention maps, which in turn helps the network infer co-salient targets and ultimately guide it towards accurate predictions. We evaluate our method on three challenging CoSOD benchmark datasets using four widely recognized metrics, and experimental results demonstrate that our approach outperforms existing CoSOD methods.
Loading