Keywords: Large Language Models, Prompt Engineering, Reasoning, Automatic Prompt Generation
TL;DR: We present an automatic prompting method that does not need access to training data for the task at hand, is sample specific and delivers results comparable automatic prompting methods trained on each task.
Abstract: LLMs are sensitive to prompting, with task performance often hinging on subtle,
sometimes imperceptible variations in phrasing. As a result, crafting effective
prompts manually remains challenging and time-consuming. Recent automatic
prompting methods mitigate this difficulty but face three key limitations: (i) for
each new task, they require large datasets to train good prompts; (ii) they rely on
costly optimization loops that may take hours; (iii) they typically produce a single
task-level prompt that does not adapt to the individual input problem to be solved.
We propose GPS, the first general-purpose, per-sample prompting method. Without any task-specific tuning, GPS generates a tailored prompt for each unseen
input, improving performance across diverse tasks. The prompter is trained with
reinforcement learning on a suite of training tasks and includes a novel regularization for effectively adapting to per-sample prompting. Finally, we employ Minimum Bayes Risk decoding to stabilize inference.
Empirically, GPS demonstrates competitive performance: we attain second best
results among baselines on text simplification, third best results on summarization
and on-par results on classification, while not training on any of these tasks, in
contrast to the baselines. For in-domain prompting, we obtain sota on GSM8K.
Our work shows the potential of a novel and effective paradigm for automatic
prompting: generating adaptive, input-specific prompts without extensive optimization and without access to a task-specific training set. Code and data will be
released upon acceptance.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 474
Loading