PLPP: PROMPT LEARNING WITH PERPLEXITY FOR VISION-LANGUAGE MODELS

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Vision-Language Models, Prompt Learning, Perplexity.
TL;DR: We propose a new CoOp-based method to make prompts more comprehensible, while maintaining a high accuracy in downstream tasks.
Abstract: Pre-trained vision-language (VL) models such as CLIP have demonstrated their excellent performance across numerous downstream tasks. A recent method, called Context Optimization (CoOp), further improves the performance of CLIP on downstream tasks by introducing prompt learning. CoOp optimizes a set of learnable vectors, aka prompt and freezes the whole CLIP model, instead of using manually crafted templates (e.g., a template ``a photo of a \{category\}'') to fine-tune the CLIP model. Nonetheless, we observed that the resulting prompts are always incomprehensible, which is counter-intuitive, and existing CoOp-based methods overlook this issue. As the first work aiming at learning comprehensible prompts, this paper proposes to use Perplexity to supervise the process of prompt learning in the CoOp framework. Perplexity is a metric to evaluate the quality of a language model (LM) in Natural Language Processing field, and we design a two-step operation to compute the perplexity for prompts. The first step is a calculation of cosine similarity to obtain the labels of vectors, and the second step is a training-free LM Head to output word probability distribution. Our proposed method, i.e., \textbf{P}rompt \textbf{L}earning with \textbf{P}er\textbf{P}lexity (PLPP), can be integrated in any CoOp-based method and the experiments show that the learned prompts are much more comprehensible compared with the original and an improved CoOp methods, without sacrificing model accuracy. Codes are available at \href{https://github.com}{https://github.com}.
Supplementary Material: pdf
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7977
Loading