A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations

ICLR 2026 Conference Submission17127 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Benchmark, Personalize, Conversation
Abstract: We present \textsc{PersonaConvBench}, a large-scale benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs). Unlike existing work that focuses on personalization or conversational structure in isolation, \textsc{PersonaConvBench} tightly integrates both, offering three core tasks: sentence classification, impact regression, and user-centric text generation, covering 10 diverse Reddit-based domains. This design enables systematic analysis of how personalized conversational context can shape LLM outputs in realistic, multi-user conversational scenarios. We systematically benchmark several commercial and open-source LLMs under a unified prompting setup, and observe that incorporating personalized conversational history yields substantial performance boosts—e.g., achieving a 198\% relative gain over the best non-conversational baseline in sentiment classification. By releasing \textsc{PersonaConvBench} with comprehensive evaluations and codes, we aim to facilitate research on LLMs that can adapt to individuals’ conversational styles, track long-term context, and generate more contextually rich and engaging responses.
Primary Area: datasets and benchmarks
Submission Number: 17127
Loading