MPIRD-LLMA: Exploring How Large Language Models Perform in Complex Social Interactive Environments

ACL ARR 2024 August Submission72 Authors

12 Aug 2024 (modified: 19 Sept 2024)ACL ARR 2024 August SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) have demonstrated significant potential in environmental perception and decision-making based on reasoning. Given the progress of LLMs-based Agents in simulating complex human behaviours, this paper introduces a more challenging dataset, the Multiverse Phantasm Interactive Role-play Dataset (MPIRD-LLMA), along with a corresponding simulation framework. Additionally, we have designed evaluation methods, including the LLM-Score and Rouge-L, to assess Agents' performance. This dataset centres around murder mystery games and contains eight meticulously designed scripts covering various themes and styles. Each character is equipped with detailed background stories and complex interpersonal networks. The dataset features 50 unique characters, with each script including more than five characters on average. The total volume of the scripts exceeds 400,000 words, with an average background description of more than 1,000 words per character. The evaluation results show that current LLM-based agents cannot handle such a complex MPIRD-LLMA simulation, and RLHF and safety alignment would limit the potential of LLMs-based Agents.
Paper Type: Short
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: Computational Social Science and Cultural Analytics, Dialogue and Interactive Systems
Contribution Types: Data resources
Languages Studied: chinese, english
Submission Number: 72
Loading