Research Area: Alignment, Data, Learning algorithms for LMs
Keywords: question asking, preference elicitation, expert iteration, self-improvement
TL;DR: We explore a language model's ability to self-improve by rewarding the model for generating useful questions.
Abstract: When prompting language models to complete a task, users often leave important aspects unsaid. While asking questions could resolve this ambiguity (GATE; Li et al., 2023), models often struggle to ask good questions. We explore a language model's ability to self-improve (STaR; Zelikman et al., 2022) by rewarding the model for generating useful questions—a simple method we dub STaR-GATE. We generate a synthetic dataset of 25,500 unique persona-task prompts to simulate conversations between a pretrained language model—the $\texttt{Questioner}$—and a $\texttt{Roleplayer}$ whose preferences are unknown to the $\texttt{Questioner}$. By asking questions, the $\texttt{Questioner}$ elicits preferences from the $\texttt{Roleplayer}$. The $\texttt{Questioner}$ is iteratively finetuned on questions that increase the probability of high-quality responses to the task, which are generated by an $\texttt{Oracle}$ with access to the $\texttt{Roleplayer}$'s latent preferences. After two iterations of self-improvement, the $\texttt{Questioner}$ asks better questions, allowing it to generate responses that are preferred over responses from the initial model on $\textbf{72}$% of tasks. Our results indicate that teaching a language model to ask better questions leads to better personalized responses.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 260
Loading