A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations

Li Li; Peilin Cai; Ryan A. Rossi; Franck Dernoncourt; Branislav Kveton; Junda Wu; Tong Yu; Linxin Song; Tiankai Yang; Yuehan Qin; Nesreen K. Ahmed; Samyadeep Basu; Subhojyoti Mukherjee; Ruiyi Zhang; Zhengmian Hu; Bo Ni; Yuxiao Zhou; Zichao Wang; Yue Huang; Yu Wang; Xiangliang Zhang; Philip S. Yu; Xiyang Hu; Yue Zhao

A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations

Published: 06 Oct 2025, Last Modified: 04 Nov 2025MTI-LLM @ NeurIPS 2025 SpotlightEveryoneRevisionsBibTeXCC BY-ND 4.0

Keywords: Personalized Conversation, Benchmark

Abstract: We present PersonaConvBench, a large-scale benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs). Unlike existing work that focuses on personalization or conversational structure in isolation, PersonaConvBench tightly integrates both, offering three core tasks: sentence classification, impact regression, and user-centric text generation, covering 10 diverse Reddit-based domains. This design enables systematic analysis of how personalized conversational context can shape LLM outputs in realistic, multi-user conversational scenarios. We systematically benchmark several commercial and open-source LLMs under a unified prompting setup, and observe that incorporating personalized conversational history yields substantial performance boosts—e.g., achieving a 198% relative gain over the best non-conversational baseline in sentiment classification. By releasing PersonaConvBench with comprehensive evaluations and codes, we aim to facilitate research on LLMs that can adapt to individuals’ conversational styles, track long-term context, and generate more contextually rich and engaging responses.

Submission Number: 88

Loading