FreeSegDiff: Annotation-free Saliency Segmentation with Diffusion Models

Published: 2025, Last Modified: 08 Feb 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Learning from a large corpus of data, pre-trained models have achieved impressive progress nowadays. As a popular generative pre-training method, diffusion models stand out by capturing both low-level visual knowledge and high-level semantic relations. In this paper, we propose to exploit such knowledgeable pre-trained diffusion models for mainstream discriminative tasks such as annotation-free saliency segmentation. However, a notable structural discrepancy between generative and discriminative models poses a significant challenge to diffusion models’ direct application. Furthermore, the absence of explicit manually labeled data is a substantial barrier in annotation-free settings. To tackle these issues, we introduce FreeSegDiff, one novel synthesis-exploitation framework containing two-stage strategies. In the first synthesis stage, to alleviate data insufficiency, we synthesize abundant images, and propose a novel training-free DiffusionCut to produce masks. In the second exploitation stage, to bridge the structural gap, we employ the inversion technique to convert given images back to diffusion features. These features seamlessly integrate with downstream architectures. Extensive experiments and ablation studies demonstrate the superiority of adapting diffusion for annotation-free saliency segmentation.
Loading