Abstract: The text-to-image diffusion models have been applied to image segmentation, demonstrating the potential of diffusion models in segmentation. However, texts often struggle to accurately describe objects, particularly when it comes to fine-grained details. Sketches, on the other hand, can address the issue to some extent. We observed that the intermediate features of the diffusion model guided by sketch contain more effective semantic information for segmentation compared to those guided by text. Therefore, we propose Sketch2Seg, a sketch-based image segmentation method with a diffusion model. By extracting intermediate features of a sketch-to-image diffusion model, only a simple pixel classifier needs to be trained. Quantitatively, our method reaches 78.47% and 81.07% mIOU on the PASCAL VOC and SketchySeg datasets on zero-shot setups, respectively. To investigate fine-grained sketch segmentation and detection, we contribute the SketchyCOCOSeg dataset which contains segmentation annotations for images corresponding to the SketchyCOCO dataset. Our code is available here.
Loading