Optimizing User Profiles via Contextual Bandits for Retrieval-Augmented LLM Personalization

Optimizing User Profiles via Contextual Bandits for Retrieval-Augmented LLM Personalization

ACL ARR 2026 January Submission7216 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Personalization, Retrieval Augmented Generation, Reranking

Abstract: Large Language Models (LLMs) excel at general-purpose tasks, but personalizing their responses to individual users remains challenging. Retrieval augmentation offers a lightweight alternative to fine-tuning by conditioning LLMs on user history records, yet existing strategies rely on heuristics (e.g., relevance to the query) that overlook the true contribution of records to personalization. To address these limitations, we propose PURPLE, a contextual bandit framework that oPtimizes UseR Profiles for Llm pErsonalization. PURPLE operates as a re-ranking layer over candidate records, balancing efficiency with personalization quality. Across nine real-world personalization tasks spanning classification, regression, and short- and long-text generation, PURPLE consistently outperforms strong heuristic and retrieval-augmented baselines, establishing contextual bandit retrieval as a principled and scalable solution for personalized LLMs. Our code is available at: https://anonymous.4open.science/r/ACL-2026-PURPLE-3096/.

Paper Type: Long

Research Area: Retrieval-Augmented Language Models

Research Area Keywords: retrieval-augmented generation,reinforcement learning,human-AI interaction/cooperation

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 7216

Loading