Keywords: Salient object detection, Camouflaged object detection, SAM, Benchmark
TL;DR: We introduce a benchmark called Unconstrained Salient and Camouflaged Object Detection \textbf{(USCOD)}, which supports the simultaneous detection of salient and camouflaged objects in unconstrained scenes, regardless of their presence.
Abstract: Visual Salient Object Detection (SOD) and Camouflaged Object Detection (COD) are two interrelated yet distinct tasks. Both tasks model the human visual system's ability to perceive the presence of objects. The traditional SOD datasets and methods are designed for scenes where only salient objects are present, similarly, COD datasets and methods are designed for scenes where only camouflaged objects are present. Scenes where both salient and camouflaged objects coexist, or where neither is present, are not considered. This simplifies the existing research on SOD and COD. In this paper, to explore a more generalized approach to SOD and COD, we introduce a benchmark called Unconstrained Salient and Camouflaged Object Detection \textbf{(USCOD)}, which supports the simultaneous detection of salient and camouflaged objects in unconstrained scenes, regardless of their presence. Towards this, we construct a large-scale dataset, \textbf{CS12K}, that encompasses a variety of scenes, including four distinct types: scenes containing only salient objects, scenes with only camouflaged objects, scenes where both salient and camouflaged objects coexist, and scenes without any objects. In our benchmark experiments, we find that a major challenge in USCOD is distinguishing salient objects from camouflaged objects within the same model. To address this, we propose a USCOD baseline called \textbf{\ourmodel}, which freezes the SAM mask decoder for mask reconstruction, allowing the model to focus on distinguishing between salient and camouflaged objects. Furthermore, to evaluate models’ ability to distinguish between salient and camouflaged objects, we design a metric called Camouflage-Saliency Confusion Score (\textbf{CSCS}). The proposed method achieves state-of-the-art performance on the newly introduced USCOD task. The code and dataset will be publicly available.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 38
Loading