T-POP: Test-Time Personalization with Online Preference Feedback

Qu Zikun; Min Zhang; Mingze Kong; Xiang Li; Zhiwei Shang; Zhiyong Wang; Yikun Ban; Shuang Qiu; Yao Shu; Zhongxiang Dai

T-POP: Test-Time Personalization with Online Preference Feedback

Qu Zikun, Min Zhang, Mingze Kong, Xiang Li, Zhiwei Shang, Zhiyong Wang, Yikun Ban, Shuang Qiu, Yao Shu, Zhongxiang Dai

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Test-Time Alignment, Dueling Bandits, Preference Feedback

TL;DR: We introduce a test-time algorithm that personalizes a frozen LLM for new users by leveraging dueling bandits to efficiently learn from their online pairwise preference feedback.

Abstract: Personalizing large language models (LLMs) to individual user preferences is a critical step beyond generating generically helpful responses. However, current personalization methods are ill-suited for new users, as they typically require either slow, resource-intensive fine-tuning or a substantial amount of pre-existing user data, creating a significant cold-start problem. To address this challenge, we introduce a new paradigm for real-time personalization by learning from online pairwise preference feedback collected during text generation. We propose T-POP (Test-Time Personalization with Online Preference Feedback), a novel algorithm that synergistically combines test-time alignment with dueling bandits. Without updating the LLM parameters, T-POP steers the decoding process of a frozen LLM by learning a reward function online that captures user preferences. By leveraging dueling bandits, T-POP intelligently queries the user to efficiently balance between exploring their preferences and exploiting the learned knowledge to generate personalized text. Extensive experiments demonstrate that T-POP achieves rapid and data-efficient personalization, significantly outperforming existing baselines and showing consistent improvement with more user interactions.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 18170

Loading