TL;DR: We propose a novel weakly supervised setting, True-False Label (TFL), which generates reliable labels using vision-language models. Besides, we introduce a risk-consistent method to learn from TFLs via multi-modal prompt retrieving.
Abstract: Pre-trained **V**ision-**L**anguage **M**odels (VLMs) exhibit strong zero-shot classification abilities, demonstrating great potential for generating weakly supervised labels. Unfortunately, existing weakly supervised learning methods are short of ability in generating accurate labels via VLMs. In this paper, we propose a novel weakly supervised labeling setting, namely **T**rue-**F**alse **L**abels (TFLs) which can achieve high accuracy when generated by VLMs. The TFL indicates whether an instance belongs to the label, which is randomly and uniformly sampled from the candidate label set. Specifically, we theoretically derive a risk-consistent estimator to explore and utilize the conditional probability distribution information of TFLs. Besides, we propose a convolutional-based **M**ulti-modal **P**rompt **R**etrieving (MRP) method to bridge the gap between the knowledge of VLMs and target learning tasks. Experimental results demonstrate the effectiveness of the proposed TFL setting and MRP learning method. The code to reproduce the experiments is at https://github.com/Tranquilxu/TMP.
Lay Summary: Pre-trained Vision-Language Models (VLMs) exhibit strong zero-shot classification abilities, demonstrating great potential for generating weakly supervised labels. In this paper, we propose a novel weakly supervised setting, True-False Label (TFL), which generates reliable labels using vision-language models. Besides, we introduce a risk-consistent method to learn from TFLs via multi-modal prompt retrieving. Our findings show that pre-trained models can improve themselves without introducing additional labeling.
Link To Code: https://github.com/Tranquilxu/TMP
Primary Area: General Machine Learning->Everything Else
Keywords: Weakly supervised learning; True-False labels; Multi-modal prompt retrieving; Multi-class classification;
Submission Number: 3823
Loading