Keywords: vision-language model, few-shot classification, black-box, prompt learning
Abstract: With the emergence of pretrained vision-language models (VLMs), considerable
efforts are devoted to fine-tuning them to downstream tasks. Despite the progress
made in designing efficient fine-tuning methods, such methods require access to
the model’s parameters, which can be challenging due to the high pretraining cost
of VLMs. Consequently, model owners often opt to provide the model as a service
to safeguard model ownership. In the paper, we propose CollAboRative pROmpt
Tuning (CARROT) approach for fine-tuning black-box VLMs to downstream
tasks, where we only have access to the input prompts and the output predictions
of the model. Specifically, CARROT comprises two modules, a prompt generation
module for learning text prompts and a prediction refinement module that enhances output predictions in residual style. Additionally, we introduce an auxiliary
prediction-consistent loss to promote consistent optimization across these modules.
To optimize the modules, we develop a novel collaborative training algorithm that
alternatively optimizes the prompt generation module and the prediction refinement module via the derivative-free and the derivative-based methods, respectively.
Extensive experiments on few-shot classification over 15 datasets demonstrate the
superiority of CARROT. The results show that CARROT achieves a decent gain
of about 12% with 16-shot datasets and only 8,000 queries. Moreover, CARROT
trains faster and uses only about 1/80 of the memory footprint for deployment,
while sacrificing only 1.62% of performance compared to the white-box method.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5499
Loading