Learning to Select In-context Examples from Reward

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Natural language processing, In-context learning, Large Language Model, GPT, LLaMA, Vicuna
TL;DR: A method that learns to select context examples conditions on the input question actively.
Abstract: Large language models (LLMs) have impressive in-context learning ability. When prompted with a few examples of the same task, LLMs can solve new questions without task-specific training, demonstrating their ability of in-context learning. Recent studies revealed that the selection of contexts can significantly affect the LM’s answer quality. In this work, we propose Reward-Guided Example Selection(ReGES), a novel method that learns to iteratively select in-context examples conditioned on the input question from feedback. Given a task and an example set, we use the MCTS algorithm to select different in-context examples, collect the LLM’s outputs, and evaluate their accuracies. Then, we leverage the offline RL algorithm to train a value function to estimate the reward from in-context learning. During inference, we iteratively select a sequence of in-context examples for the given question based on the prediction of the value function. Our method substantially improves the performance of several LLMs (Vicuna, LLaMA-2, GPT3.5) on four benchmarks (GSM8K, Strategy QA, TREC, QNLI), and can be combined with in-context example retrieval method to give further improvement. When combined with BM25, ReGES achieves up to $+6.6$ accuracy improvement with an average of $+2.25$ over strong baselines. Moreover, we observe consistent improvement while applying the in-context examples selected by our method to language models that are not used during the training phase, demonstrating its generalization ability.
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7511
Loading