Keywords: LLM Personalization, Retrieval Augmented Generation, Reranking
Abstract: Large Language Models (LLMs) excel at general-purpose tasks, but personalizing their responses to individual users remains challenging. Retrieval augmentation offers a lightweight alternative to fine-tuning by conditioning LLMs on user history records, yet existing strategies rely on heuristics (e.g., relevance to the query) that overlook the true contribution of records to personalization. To address these limitations, we propose PURPLE, a contextual bandit framework that oPtimizes UseR Profiles for Llm pErsonalization. PURPLE operates as a re-ranking layer over candidate records, balancing efficiency with personalization quality. Across nine real-world personalization tasks spanning classification, regression, and short- and long-text generation, PURPLE consistently outperforms strong heuristic and retrieval-augmented baselines, establishing contextual bandit retrieval as a principled and scalable solution for personalized LLMs. Our code is available at: https://anonymous.4open.science/r/ACL-2026-PURPLE-3096/.
Paper Type: Long
Research Area: Retrieval-Augmented Language Models
Research Area Keywords: retrieval-augmented generation,reinforcement learning,human-AI interaction/cooperation
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 7216
Loading