APEE: Assessing the Personality Expressions of LLM-Driven Role Play Agent Beyond Self-Perception

Published: 2025, Last Modified: 21 Jan 2026CSCWD 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Large language models (LLMs) have demonstrated significant progress in role-playing tasks, yet evaluating their ability to simulate personality traits remains a challenge. Traditional psychological questionnaires-based method have been used to assess LLMs' personality traits. However, these approaches have limitations when applied to LLM-driven role-playing agents (RPAs), as they are designed for humans and rely on stable, self-assessed personality traits. To bridge this gap, we extend simple self-perception questionnaires to more objective, real-world evaluations. In this paper, we introduce APEE, a new dataset consisting of 473 instances across three real-world scenario types: practical goal planning, social media behavior, and leaderless group discussions (LGD). In addition to evaluating whether LLMs adhere to predefined character traits, we introduce two key metrics: Stability and Differentiation. These metrics assess how consistently LLMs express personality traits across different scenarios (Stability) and how effectively they differentiate their behavior when assuming multiple roles (Differentiation). We conducted experiments on 339 different roles using 11 advanced LLMs with the APEE dataset. Discuss the impact of factors such as model size and architecture. Code and dataset are available at https://github.com/linkseed18612254945/APEE_Personality.
Loading