Keywords: large langauge model, digital twin, persona simulation
Abstract: Large Language Models (LLMs) are exhibiting emergent human-like abilities and are increasingly envisioned as the foundation for simulating a specific communication style, behavioral tendencies, and personality traits.
However, current evaluations of LLM-based persona simulation remain limited: most rely on synthetic dialogues, lack systematic frameworks, and lack analysis of the capability requirement.
To address these limitations, we introduce TwinVoice, a comprehensive benchmark for assessing persona simulation across diverse real-world contexts.
TwinVoice encompasses three dimensions: Social Persona (public social interactions), Interpersonal Persona (private dialogues), and Narrative Persona (role-based expression).
The ability of LLMs in persona simulation is further decomposed into six fundamental capabilities, including opinion consistency, memory recall, logical reasoning, lexical fidelity, persona tone, and syntactic style.
Experimental results reveal that while advanced models achieve moderate accuracy, they remain insufficient in sustaining consistent persona simulation, especially lacking the capability of syntactic style and memory recall.
Our data, code, and evaluation results are available at https://anonymous.4open.science/r/TwinVoice-B08E.
Primary Area: datasets and benchmarks
Submission Number: 738
Loading