Embracing Diversity: Zero-shot Classification Beyond a Single Vector per Class

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: zero shot, classification, vision language models, fairness
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Improving zero-shot classification by leveraging under-utilized capabilities of VLMs to infer and explicitly account for diversity within classes
Abstract: Vision-language models for the first time enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today’s best models exhibit skewed performance when objects are dissimilar from their typical depiction. Real world objects such as pears appear in a variety of forms --- from diced to whole, on a table or in a bowl --- yet standard VLM classifiers map all instances of a class to a single vector based on the class label. We argue that to represent this rich diversity within a class, zero-shot classification should move beyond a single vector. We propose a method to encode and account for diversity within a class using inferred attributes, still in the zero-shot setting without retraining. We find our method consistently outperforms standard zero-shot classification over a large suite of datasets encompassing hierarchies, diverse object states, and real-world geographic diversity. We also find our method scales efficiently to a large number of attributes to account for diversity---leading to more accurate predictions for atypical instances. Finally, we highlight how our method offers fine-grained human-interpretable explanations of model predictions. We hope this work spurs further research into the promise of zero-shot classification beyond a single class vector for capturing diversity in the world.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2649
Loading