XPL: A Cross-Model framework for Semi-Supervised Prompt Learning in Vision-Language Models

TMLR Paper2203 Authors

15 Feb 2024 (modified: 06 Apr 2024)Under review for TMLREveryoneRevisionsBibTeX
Abstract: Prompt learning, which focuses on learning soft prompts, has emerged as a promising approach for efficiently adapting pretrained vision-language models (VLMs) to multiple downstream tasks. While prior works have shown promising performances on common benchmarks, they typically rely on labeled data samples only. This greatly discredits the information gain from the vast collection of otherwise unlabeled samples available in the wild. To mitigate this, we propose a simple yet efficient cross-model framework to leverage on the unlabeled samples achieving significant gain in model performance. Specifically, we employ a semi-supervised prompt learning approach which makes the learned prompts invariant to the different views of a given unlabeled sample. The multiple views are obtained using different augmentations on the images as well as by varying the lengths of visual and text prompts attached to these samples. Experimenting with this simple yet surprisingly effective approach over a large number of benchmark datasets, we observe a considerable improvement in the quality of soft prompts thereby making an immense gain in image classification performance. Interestingly, our approach also benefits from out-of-domain unlabeled images highlighting the robustness and generalization capabilities. Our code will be made publicly available.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Zhiding_Yu1
Submission Number: 2203
Loading