CARPRT: Class-Aware Prompt Reweighting for Pre-Trained Vision-Language Models

Ruijiang Dong; Zesheng Ye; Feng Liu; Jianzhong Qi; Gang Niu; Masashi Sugiyama

CARPRT: Class-Aware Prompt Reweighting for Pre-Trained Vision-Language Models

Ruijiang Dong, Zesheng Ye, Feng Liu, Jianzhong Qi, Gang Niu, Masashi Sugiyama

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Prompt Weighting, Vision-language Models

Abstract: When using a pre-trained vision-language model (VLM) to classify an image, we often need to use the pre-trained VLM to compute a similarity score between the image and texts containing a semantic label, e.g., “a photo of a cat”, where “a photo of a” is called a prompt and “cat” is the semantic label (a.k.a. a class in classification tasks). The existing studies have shown that the selection of prompts can significantly affect the scoring scheme between a given image and a semantic label, and they proposed a new score via using a weighting vector to reassemble scores regarding different prompts. However, these studies assume that all classes should share the same weighting vector. In this paper, we first empirically show that the existing approach is sub-optimal. We subsequently revisit the existing reweighting strategy from a probabilistic view and find an implicit assumption in prior work: the conditional independence of classes and weights, which often does not hold in practice. To cope with this problem, we propose class-aware prompt reweighting (CARPRT), a strategy designed to adjust the weighting vector for each class. CARPRT calculates the relevance scores for prompt-class pairs with respect to all images, and identifies the maximum score for each prompt-class pair. These maximum scores are then averaged across prompts for each class to estimate the class-specific weighting vectors, ensuring that prompts are optimally reweighted based on class-specific information. Our experiments demonstrate that CARPRT outperforms the existing reweighting strategy under the image classification tasks.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6458

Loading