Pseudo-Mask and Language: A Simple Recipe for Open-Vocabulary Semantic Segmentation

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: vision-language understanding, segmentation, open-vocabulary segmentation
TL;DR: We demonstrate that directly train a MaskFormer using pseudo-mask and language for pixel-level feature and language alignment yields superior results in open-vocabulary segmentation.
Abstract: We present a conceptually simple framework for open-vocabulary semantic segmentation, which accurately assigns a semantic label to each pixel in an image from a set of arbitrary open-vocabulary texts. Our method, P-Seg, leverages pseudo-mask and language to train a MaskFormer, and can be easily trained from publicly available image-text datasets. Once trained, P-Seg generalizes well to multiple testing datasets without requiring fine-tuning. Without bells and whistles, our method achieves state-of-the-art open-vocabulary semantic segmentation results on three widely tested benchmarks (Pascal VOC, Pascal Context, and COCO). In addition, P-Seg has the extra benefits of scalability with data and consistently improving when augmented with self-training. We believe that our simple yet effective approach will serve as a solid baseline for future research. Our code and demo will be made publicly available soon.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5339
Loading