Abstract: Large language models (LLMs) have advanced the development of various AI conversational agents, including role-playing conversational agents that mimic diverse characters and human behaviors. While prior research has predominantly focused on enhancing the conversational capability, role-specific knowledge, and stylistic attributes of these agents, there has been a noticeable gap in assessing their social intelligence. In this paper, we introduce SocialBench, the first benchmark designed to systematically evaluate the sociality of role-playing conversational agents at both individual and group levels. The benchmark is constructed from a variety of sources and covers a wide range of 512 characters and 6,420 question prompts involved in 1,480 diverse conversation scenarios and 30,871 multi-turn role-playing utterances. We conduct comprehensive evaluations on this benchmark using mainstream open-source and closed-source LLMs, confirming its significance as a testbed for assessing the sociality of role-playing conversational agents.
Paper Type: short
Research Area: Dialogue and Interactive Systems
Languages Studied: English,Chinese
0 Replies
Loading