Vision-Language Subspace Prompting

17 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Prompt Learning; Vision Language Models
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Prompting vision-language models like CLIP to adapt to downstream tasks is currently topical. A seminal technique to this end is context optimization, which replaces a subset of textual tokens with trainable parameters (a.k.a soft prompts). However, current pipelines use a single vector embedding induced by soft prompts as the classifier weight for visual recognition. This can lead to problems where the learned soft prompts overfit to base classes’ training data, resulting in poor performance when applied to novel classes. Several approaches were proposed to address this issue by regularizing the learned soft prompts to align them with handcrafted text/hard prompts. However, excessive regularization of the soft prompts can hurt the model’s performance on the base classes it is trained on. Maintaining the right balance to ensure strong base- and novel-class performance is crucial but non-trivial. In this paper, we introduce a novel subspace-based prompt learning method, named SuPr, which can effectively model subspaces spanning the embeddings of both the learnable soft and the textual/hard prompts. Our subspace-based alignment between hand-crafted and learnable prompts balances these effects to achieve excellent fitting of base classes as well as generalization to novel classes. With the advantages of subspace modelling, our SuPr shows its effectiveness on generalization from base to new, domain generalization, cross-dataset transfer and few-shot learning, leading to new state-of-the-art results in all settings.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 841
Loading