Learning to Select In-context Examples from Reward

Anonymous

Learning to Select In-context Examples from Reward

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: A method that learns to select context examples conditions on the input question actively.

Abstract: Large language models (LLMs) have impressive in-context learning ability. When prompted with a few examples of the same task, LLMs can solve new questions without task-specific training, demonstrating their ability of in-context learning. Recent studies revealed that the selection of contexts can significantly affect the LM’s answer quality. In this work, we propose Reward-Guided Example Selection(ReGES), a novel method that learns to iteratively select in-context examples conditioned on the input question from feedback. Given a task and an example set, we use the MCTS algorithm to select different in-context examples, collect the LLM’s outputs, and evaluate their accuracies. Then, we leverage the offline RL algorithm to train a value function to estimate the reward from in-context learning. During inference, we iteratively select a sequence of in-context examples for the given question based on the prediction of the value function. Our method substantially improves the performance of several LLMs (Vicuna, LLaMA-2, GPT3.5) on four benchmarks (GSM8K, Strategy QA, TREC, QNLI), and can be combined with in-context example retrieval method to give further improvement. When combined with BM25, ReGES achieves up to $+6.6$ accuracy improvement with an average of $+2.25$ over strong baselines. Moreover, we observe consistent improvement while applying the in-context examples selected by our method to language models that are not used during the training phase, demonstrating its generalization ability.

Paper Type: long

Research Area: Question Answering

Contribution Types: NLP engineering experiment

Languages Studied: English

0 Replies

Loading