Inference-Time Personalized Alignment with a Few User Preference Queries

Published: 18 Sept 2025, Last Modified: 12 Dec 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Keywords: personalized alignment, inference-time alignment, user preferences, best of N, best-arm identification, logistic bandits
TL;DR: We propose a novel inference-time personalized alignment method that elicits the user's preferences with a few preference queries.
Abstract: We study the problem of aligning a generative model's response with a user's preferences. Recent works have proposed several different formulations for personalized alignment; however, they either require a large amount of user preference queries or require that the preference be explicitly specified as a text input. In this paper, we propose a novel inference-time personalized alignment method, UserAlign, that elicits the user's preferences with a few queries as pairwise response comparisons. In particular, UserAlign builds on the theoretical framework of best-arm identification in logistic bandits and selects a personalized response from a fixed pool of the model's generated responses. The key idea is to consider the user's feedback consistent and noise-free, and incorporate it into the theoretical framework to identify the best response quickly. Experimental results across several tasks, involving personalized text and image generation, showcase the effectiveness of UserAlign in achieving personalized alignment.
Primary Area: General machine learning (supervised, unsupervised, online, active, etc.)
Submission Number: 23739
Loading