PSP: Prompt-Guided Self-Training Sampling Policy for Active Prompt Learning

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Active Prompt Learning, Reinforcement Learning, CLIP
TL;DR: We propose a Prompt-Guided Self-Training Sampling Policy (PSP) for Active Prompt Learning, which integrates Soft Actor-Critic with a customized real-pseudo hybrid reward and vectorized critics.
Abstract: Active Prompt Learning (APL) using vision-language models (\textit{e.g.}, CLIP) has attracted considerable attention for mitigating the dependence on fully labeled dataset in downstream task adaptation. However, existing methods fail to explicitly leverage prompt to guide sample selection, resulting in the selected samples being ineffective in facilitating the prompt template's downstream task adaptation, while also overlooking valuable complementary information in the unselected samples. To fill this gap, we propose a novel Prompt-Guided Self-Training Sampling Policy (PSP) for APL, which integrates Soft Actor-Critic with a customized real-pseudo hybrid reward and vectorized critics to incorporate prompts in guiding sample selection toward those that facilitate the optimization of prompt template, by jointly considering both selected and unselected samples. Specifically, PSP comprises two prominent components: Vectorized Soft Actor-Critic Sampling Policy (VSSP) and Uncertainty Augmented Self-Training (UST) mechanism. VSSP customizes a real-pseudo hybrid reward based on learned prompts and image features, which is fed into vectorized critics to estimate Q-value for each sample and compute gradients that optimize the actor, allowing it to refine its sampling policy in an End-to-End manner to identify the most informative samples for prompt learning. Moreover, UST leverages the CLIP from the previous round to generate reliable pseudo-labeled data based on uncertainty and confidence of average predictions, thereby deepening the understanding of the overall data. Extensive experiments conducted on diverse real-world datasets validate the effectiveness of our PSP.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 15345
Loading