Representational Task Bias in Zero-shot Recognition at ScaleDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: vision-language models, CLIP, prompting, task representation
TL;DR: We show CLIP image representations are biased towards being used for a specific task a priori, and provide a simple method cue which task is desired without model retraining.
Abstract: Research from the last year has demonstrated that vision-language pre-training at scale from incidental supervision on the Internet can result in representations with clear advantages over traditional supervised training for many computer vision tasks. We conduct an in-depth exploration of the CLIP model, and find that the interface that language creates to these learned representations -- by the same token as enabling zero-shot application for many tasks -- leads the model to solve tasks that may not have been intended by the user in realistic scenarios. We call the inherent uncertainty of which task a user intends to solve in zero-shot recognition \textit{task ambiguity}. To evaluate task ambiguity, we construct a dataset of images where each image has labels for multiple semantic recognition tasks. We demonstrate that the representation produced for a given image tends to be strongly biased towards a particular task over others; in other words, they exhibit \textit{task bias}. Moreover, which task a particular image will be biased towards is unpredictable, with little consistency across images. Our results show that we can learn visual prompts to serve as effective conditioning mechanisms for which task is desired, and can even improve performance for the task when used outside the context of evaluating task ambiguity.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
4 Replies

Loading