DiffuSaliency: Synthesizing Multi-object Images with Masks for Semantic Segmentation Using Diffusion and Saliency Detection
Abstract: High-quality pixel-level annotations are crucial for semantic segmentation tasks. It is extremely time-consuming and laborious to manually label a large number of samples. DiffuMask provides a feasible method to generate synthetic image-mask pairs. However, DiffuMask only generates a single object within an image and the mask sometimes is not fine enough. In this paper, we propose DiffuSaliency which synthesizes multi-object images with masks for semantic segmentation using diffusion models and saliency detection.DiffuSaliency utilizes a diffusion model to generate single-object images with attention maps. Then it obtains two masks for each image via two different ways, adaptive binarization and saliency detection. Finally, DiffuSaliency randomly selects one mask for each image and uses these image-mask pairs to obtain multi-object image-mask pairs using image blending. We conducted extensive experiments on the VOC 2012 and CityScapes datasets and achieved promising results compared with other methods. Moreover, DiffuSaliency surpasses DiffuMask on the Unseen classes of VOC 2012, achieving new state-of-the-art results.
Loading