PRL: Prompts from Reinforcement Learning

11 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Prompt Engineering, Reasoning, Automatic Prompt Generation
TL;DR: We propose to learn prompts through reinforcement learning, obtaining prompt including newly synthesized few-shot examples and outperforming other automatic prompting methods on text classification, summarization, simplification and GSM8K
Abstract: Effective prompt engineering remains a central challenge in fully harnessing the capabilities of LLMs. While well-designed prompts can dramatically enhance performance, crafting them typically demands expert intuition and a nuanced understanding of the task. Moreover, the most impactful prompts often hinge on subtle semantic cues, ones that may elude human perception but are crucial for guiding LLM behavior. In this paper, we introduce PRL (Prompts from Reinforcement Learning), a novel RL-based approach for automatic prompt generation. Unlike previous methods, PRL can produce novel few-shot examples that were not seen during training. Our approach achieves state-of-the-art performance across a range of benchmarks, including text classification, simplification, summarization, and reasoning. On the classification task, it surpasses prior methods by 2.58% over APE and 1.00% over EvoPrompt. Additionally, it improves the average ROUGE scores on the summarization task by 4.32 over APE and by 2.12 over EvoPrompt and the SARI score on simplification by 6.93 over APE and by 6.01 over EvoPrompt. On the GSM8K mathematical reasoning benchmark, PRL further improves accuracy by 2.72% over APE and by 4.53% over EvoPrompt. We will make our implementation publicly available upon acceptance.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 4095
Loading