Semantic Category Discovery with Vision-language Representations

Kai Han; YANDONG LI; Sagar Vaze; Xuhui Jia

Semantic Category Discovery with Vision-language Representations

Kai Han, YANDONG LI, Sagar Vaze, Xuhui Jia

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Abstract: Object recognition is the task of identifying the category of an object in an image. While current models report excellent performance on existing benchmarks, most fall short of the task accomplished by the human perceptual system. For instance, traditional classifiers (e.g those trained on ImageNet) only learn to map an image to a predefined class index, without revealing the actual semantic meaning of the object in the image. Meanwhile, vision-language models like CLIP are able to assign semantic class names to unseen objects in a `zero-shot' manner, though they are once again provided a predefined set of candidate names at test-time. In this paper, we reconsider the recognition problem and bring it closer to a practical setting. Specifically, given only a large (essentially unconstrained) taxonomy of categories as prior information, we task a vision-language model with assigning class names to all images in a dataset. We first use non-parametric methods to establish relationships between images, which allow the model to automatically narrow down the set of possible candidate names. We then propose iteratively clustering the data and voting on class names within clusters, showing that this enables a roughly 50% improvement over the baseline on ImageNet. We demonstrate the efficacy of our method in a number of settings: using different taxonomies as the semantic search space; in unsupervised and partially supervised settings; as well as with coarse-grained and fine-grained evaluation datasets.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)

8 Replies

Loading