Weakly Supervised Salient Object Detection with Text Supervision

Published: 2026, Last Modified: 24 Feb 2026Int. J. Comput. Vis. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Weakly supervised salient object detection using image-category supervision offers a cost-effective alternative to dense annotations, yet suffers from significant performance degradation. This is primarily attributed to the limitations of existing pseudo-label generation methods, which tend to either under- or over-activate object regions and indiscriminately label all non-activated pixels as background, introducing considerable label noise. Furthermore, these methods are restricted in the ability to capture objects beyond the pre-trained category set. To overcome these challenges, we propose a CLIP-based pseudo-label generation that exploits text prompts to jointly activate generic background and salient objects, breaking the dependency on specific categories. However, we find that this paradigm faces three challenges: optimal prompt uncertainty, background redundancy, and object-background conflict. To mitigate these, we propose three key modules. First, spatial distribution-guided prompt selection evaluates the spatial distribution of activation regions to identify the optimal prompt. Second, center and scale prior-guided activation refinement integrates self-attention and superpixel cues to suppress background noise. Third, learning feedback-guided pseudo-label update learns saliency knowledge from other pseudo-labels to resolve conflicting regions and iteratively refine supervision. Extensive experiments demonstrate that our method surpasses previous weakly supervised methods with image-category supervision and unsupervised approaches.
Loading