Collaborative Prompt Tuning for Black-Box Vision-Language Models

Zhengbo Wang; Jian Liang; Ran He; Zilei Wang; Tieniu Tan

Collaborative Prompt Tuning for Black-Box Vision-Language Models

Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: vision-language model, few-shot classification, black-box, prompt learning

Abstract: With the emergence of pretrained vision-language models (VLMs), considerable efforts are devoted to fine-tuning them to downstream tasks. Despite the progress made in designing efficient fine-tuning methods, such methods require access to the model’s parameters, which can be challenging due to the high pretraining cost of VLMs. Consequently, model owners often opt to provide the model as a service to safeguard model ownership. In the paper, we propose CollAboRative pROmpt Tuning (CARROT) approach for fine-tuning black-box VLMs to downstream tasks, where we only have access to the input prompts and the output predictions of the model. Specifically, CARROT comprises two modules, a prompt generation module for learning text prompts and a prediction refinement module that enhances output predictions in residual style. Additionally, we introduce an auxiliary prediction-consistent loss to promote consistent optimization across these modules. To optimize the modules, we develop a novel collaborative training algorithm that alternatively optimizes the prompt generation module and the prediction refinement module via the derivative-free and the derivative-based methods, respectively. Extensive experiments on few-shot classification over 15 datasets demonstrate the superiority of CARROT. The results show that CARROT achieves a decent gain of about 12% with 16-shot datasets and only 8,000 queries. Moreover, CARROT trains faster and uses only about 1/80 of the memory footprint for deployment, while sacrificing only 1.62% of performance compared to the white-box method.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5499

Loading