Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents
Abstract: Role-Playing Agent (RPA) is an increasingly popular type of LLM Agent that simulates human-like behaviors in a variety of tasks.
But how should we evaluate an RPA?
It is hard because of the wide variety of task requirements and the different designs of RPA.
This paper aims to propose an evidence-based, actionable, and generalizable evaluation design guideline for LLM-based RPA by systematically reviewing $1,676$ papers published between Jan. 2021 and Dec. 2024. Our analysis synthesizes in total six agent attributes, seven task attributes, and seven evaluation metrics from existing literature. From this finding, we propose an RPA evaluation design guideline to support future researchers in designing their own evaluations in a more systematic and consistent manner.
Paper Type: Long
Research Area: Human-Centered NLP
Research Area Keywords: Role-playing agent, LLM agent, evaluation, survey
Contribution Types: Surveys
Languages Studied: English
Submission Number: 8060
Loading