Keywords: Retrieval-augmented Generation, Personalization, Contextual Bandit
Abstract: Large Language Models (LLMs) excel at general-purpose tasks, but personalizing their responses to individual users remains challenging.
Retrieval augmentation offers a lightweight alternative to fine-tuning by conditioning LLMs on user history records, yet existing strategies rely on heuristics (e.g., relevance to the query) that overlook the true contribution of records to personalization.
Through a systematic motivation study, we show that (i) relevance does not reliably predict utility, and (ii) utility is non-monotonic across records: the best user profile is not simply the combination of the best individual records, and adding more records can even hurt performance.
To address these limitations, we propose PURPLE, a contextual bandit framework that oPtimizes UseR Profiles for Llm pErsonalization.
PURPLE operates as a re-ranking layer over candidate records, balancing efficiency with personalization quality.
Across nine real-world personalization tasks spanning classification, regression, and short- and long-text generation, PURPLE consistently outperforms strong heuristic and retrieval-augmented baselines, establishing contextual bandit retrieval as a principled and scalable solution for personalized LLMs.
Our anonymized code is available.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 2510
Loading