Keywords: VLMs, prompting, visual prompting, self-supervision
Abstract: Large-scale Vision-Language Models, such as CLIP, demonstrate impressive capabilities and have multiple applications, from text-to-image generation to zero-shot classification. Recent work has suggested that visual prompts, such as a red circle, can steer the vision encoder to the circled region. While such vision prompts have now been used in various applications, they might be model-specific and depend on the model learning these behaviours from its training data. Discovering and evaluating various prompts might not be feasible given different models, tasks, and datasets. In this paper, we propose Highlight, a method to learn a visual prompt that highlights a region in an image or refines a manually engineered visual prompt. Using our framework, we can learn to highlight in a supervised way using a dataset of text-image region pairs or in an unsupervised way using synthetic captions or images only. Highlight outperforms other visual prompts, prompt learning approaches, and compute-intensive methods that use ensembles of multiple models and visual prompts.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4460
Loading