Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents

Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents

ACL ARR 2025 February Submission8060 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Role-Playing Agent (RPA) is an increasingly popular type of LLM Agent that simulates human-like behaviors in a variety of tasks. But how should we evaluate an RPA? It is hard because of the wide variety of task requirements and the different designs of RPA. This paper aims to propose an evidence-based, actionable, and generalizable evaluation design guideline for LLM-based RPA by systematically reviewing $1,676$ papers published between Jan. 2021 and Dec. 2024. Our analysis synthesizes in total six agent attributes, seven task attributes, and seven evaluation metrics from existing literature. From this finding, we propose an RPA evaluation design guideline to support future researchers in designing their own evaluations in a more systematic and consistent manner.

Paper Type: Long

Research Area: Human-Centered NLP

Research Area Keywords: Role-playing agent, LLM agent, evaluation, survey

Contribution Types: Surveys

Languages Studied: English

Submission Number: 8060

Loading