Keywords: Auto-regressive Learning, Demonstration Selection, Policy Optimization
Abstract: Effective demonstration selection is crucial for maximizing large language model (LLM) performance in few-shot in-context learning. Due to influences such as recency bias, the effectiveness of demonstrations depends heavily on their context relationship to the specific query, and on the ordering in which they are presented, making demonstration selection a complex combinatorial problem. To address these two challenges, we introduce AutoSelect, a novel framework that formulates demonstration selection as an auto-regressive sequential decision process. At each step, AutoSelect embeds the query and previously selected demonstrations into matrix representations to preserve structural information, and a trainable policy model sequentially selects the next best exemplar. To navigate the factorial space of demonstration permutations, our framework formulates a Kullback-Leibler (KL) regularized optimization problem, from which an optimal policy induces an optimal Plackett-Luce (PL) ranking over all possible demonstration sequences. Our theoretical analysis provides a principled learning objective: we prove that minimizing a tractable policy-level Cross-Entropy (CE) loss provably bounds the worst-case discrepancy between our policy's induced PL ranking and the optimal one, enabling tractable prioritization of high-quality sequences. Empirically, AutoSelect outperforms existing heuristic and learning-based methods across nine diverse datasets, achieving up to an 11\% improvement over the strongest baseline. Our results are further supported by analytical studies and a case study, highlighting AutoSelect's key properties, as well as its transferability and generalizability.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 9904
Loading