Keywords: Semi-supervised learning, Vision-language model
Abstract: In this work, we present SelfPrompt, a novel semi-supervised prompt-tuning approach for tuning vision-language models (VLMs) in a semi-supervised learning setup. Existing methods for tuning VLMs in semi-supervised setup struggle with the efficient use of the limited label set budget, the accumulation of noisy pseudo-labels and proper utilization of the unlabelled data. SelfPrompt addresses these challenges by introducing (a) a weakly-supervised sampling technique that selects a diverse and representative labelled set, (b) a cluster-guided pseudo-labelling method that improves pseudo-label accuracy, and (c) a confidence-aware semi-supervised learning module that maximizes the utility of unlabelled data by learning from high- and low-confidence pseudo-labels differently. We conduct extensive evaluations across 13 datasets, significantly surpassing state-of-the-art performance with average improvements of 7.92% in semi-supervised learning using a 2-shot setup. Our detailed ablation studies show the effectiveness of each component.
Submission Number: 14
Loading