Optimizing User Profiles via Contextual Bandits for Retrieval-Augmented LLM Personalization

Linfeng Du; Ye Yuan; Fuyuan Lyu; Emiliano Penaloza; Xiuying Chen; Zipeng Sun; Laurent Charlin; Xue Liu; Haolun Wu

Optimizing User Profiles via Contextual Bandits for Retrieval-Augmented LLM Personalization

Linfeng Du, Ye Yuan, Fuyuan Lyu, Emiliano Penaloza, Xiuying Chen, Zipeng Sun, Laurent Charlin, Xue Liu, Haolun Wu

06 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-augmented Generation, Personalization, Contextual Bandit

Abstract: Large Language Models (LLMs) excel at general-purpose tasks, but personalizing their responses to individual users remains challenging. Retrieval augmentation offers a lightweight alternative to fine-tuning by conditioning LLMs on user history records, yet existing strategies rely on heuristics (e.g., relevance to the query) that overlook the true contribution of records to personalization. Through a systematic motivation study, we show that (i) relevance does not reliably predict utility, and (ii) utility is non-monotonic across records: the best user profile is not simply the combination of the best individual records, and adding more records can even hurt performance. To address these limitations, we propose PURPLE, a contextual bandit framework that oPtimizes UseR Profiles for Llm pErsonalization. PURPLE operates as a re-ranking layer over candidate records, balancing efficiency with personalization quality. Across nine real-world personalization tasks spanning classification, regression, and short- and long-text generation, PURPLE consistently outperforms strong heuristic and retrieval-augmented baselines, establishing contextual bandit retrieval as a principled and scalable solution for personalized LLMs. Our anonymized code is available.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 2510

Loading