Keywords: Large language models, Prompt optimization, In-context Learning, Test-time alignment
Abstract: Large language models (LLMs) have been highly successful on diverse tasks, while some applications require specializing general purpose LLMs to meet stricter accuracy or latency targets; here we focus on objective question answering, an important real-world setting in which a nontrivial subset benefits from such specialization. Most existing methods require parameter retraining or human supervision, both entailing high computational and data collection burdens. To handle these challenges, a direct approach is to generate ``high-confidence'' data from unsupervised downstream tasks and use them for prompt learning or in-context learning to efficiently refine pseudo-supervision. We consider combining the two approaches for better performance; however, a naive strategy that learns the prompt first and selects pseudo-supervised examples only at inference creates a mismatch between prompt learning and usage. In this paper, we propose unsupervised few-shot prompt learning (UFPL), which jointly learns the prompt and refines the overall pseudo-supervision. The learning objective aligns prompt training with usage by requiring the learned prompt to produce consistent answers when pseudo-supervised data from the downstream task are used as in-context examples. We optimize the prompt by translating gradient signals into textual critiques, which serve as feedback to iteratively refine the prompt and the pseudo supervision. Theoretical analysis in a simplified classification setting shows that the algorithm implicitly introduces a regularization, supporting its design. Empirical results on diverse benchmarks and a real world molecule optimization task show the effectiveness of our approach.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 14880
Loading