Diffusion Models for Open-Vocabulary Segmentation

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: computer vision, semantic segmentation, open-vocabulary segmentation
TL;DR: We leverage a pre-trained diffusion model to perform open-vocabulary semantic segmentation by sampling support images for feature correlation without further training.
Abstract: The variety of objects in the real world is unlimited and is thus impossible to capture using models trained on a closed, pre-defined set of categories. Recently, open-vocabulary recognition has garnered significant attention, largely facilitated by advances in large-scale vision-language modelling. In this paper, we present OVDiff, a novel method that leverages the generative properties of text-to-image diffusion models for open-vocabulary segmentation. Specifically, we propose to synthesise support image sets from arbitrary textual categories, creating for each category a set of prototypes representative of both the category itself and its surrounding context (background). Our method relies solely on pre-trained components: segmentation is obtained by simply comparing a target image to the prototypes without further fine-tuning. We show that our method can be used to ground any pre-trained self-supervised feature extractor in natural language and provide explainable predictions by mapping back to regions in the support set. Our approach shows strong performance on a range of open-vocabulary segmentation benchmarks, obtaining a lead of more than 10% over prior work on PASCAL VOC.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5674
Loading