In-Context Personalized Alignment with Feedback History under Counterfactual Evaluation

Published: 10 Jun 2025, Last Modified: 30 Jun 2025MoFA PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: in-context learning, personalization, large language models
Abstract: Accommodating diverse preferences of users is an arising challenge in large language model (LLM) alignment. A prevalent solution is to prompt LLMs with past user feedback in earlier conversations, so that LLMs can infer and adapt generations to the user preferences. In this paper, we revisit such in-context LLM personalization paradigm under a synthetic counterfactual evaluation setup, where each candidate response can be the preferable response depending on the preferences. We examine whether model responses can be steered to diverse preferences with distinct feedback history provided in-context. Our experiments suggest that off-the-shelf LLMs struggle in understanding user preferences from in-context feedback for personalized reward modeling and response generation. We show that fine-tuning is almost necessary so that in-context feedback are leveraged, where small 7-8B LLMs improve over off-the-shelf LLMs. Lastly, we improve fine-tuned response generation models via rejection sampling of training data guided by the personalized reward model.
Submission Number: 58
Loading