Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: zero shot, classification, vision language models, fairness
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Improving zero-shot classification by leveraging under-utilized capabilities of VLMs to infer and explicitly account for diversity within classes
Abstract: Vision-language models for the first time enable open-world classification of objects without the need for any retraining. While this zero-shot paradigm marks a significant advance, even today’s best models exhibit skewed performance when objects are dissimilar from their typical depiction. Real world objects such as pears appear in a variety of forms --- from diced to whole, on a table or in a bowl ---
yet standard VLM classifiers map all instances of a class
to a single vector based on the class label.
We argue that to represent this rich diversity within a class, zero-shot classification should move beyond a single vector.
We propose a method to encode and account for diversity within a class using inferred attributes, still in the zero-shot setting without retraining.
We find our method consistently outperforms standard zero-shot classification over a large suite of datasets encompassing hierarchies, diverse object states, and
real-world geographic diversity.
We also find our method scales efficiently to a large number of attributes to account for diversity---leading to more accurate predictions for atypical instances.
Finally, we highlight how our method offers fine-grained human-interpretable explanations of model predictions.
We hope this work spurs further research into the promise of zero-shot classification beyond a single class vector for capturing diversity in the world.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2649
Loading