Entering Real Social World! Benchmarking the Theory of Mind and Socialization Capabilities of LLMs from a First-person Perspective

Guiyang Hou; Wenqi Zhang; Yongliang Shen; Zeqi Tan; Sihao Shen; Weiming Lu

Entering Real Social World! Benchmarking the Theory of Mind and Socialization Capabilities of LLMs from a First-person Perspective

Guiyang Hou, Wenqi Zhang, Yongliang Shen, Zeqi Tan, Sihao Shen, Weiming Lu

28 Sept 2024 (modified: 16 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Theory of Mind, Socialization, First-person Perspective

Abstract: In the social world, humans possess the capability to infer and reason about others' mental states (such as emotions, beliefs, and intentions), known as Theory of Mind (ToM). Simultaneously, humans' own mental states evolve in response to social situations, a capability we refer to as \textit{socialization}. Together, these capabilities form the foundation of human social interaction. In the era of artificial intelligence (AI), especially with the development of large language models (LLMs), we raise intriguing questions: How do LLMs perform in terms of ToM and \textit{socialization} capabilities? And more broadly, can these AI models truly enter and navigate the real social world? Existing research evaluating LLMs' ToM and \textit{socialization} capabilities by positioning LLMs as passive observers from a third-person perspective, rather than as active participants. However, compared to the third-person perspective, observing and understanding the world from an ego-centric first-person perspective is a natural approach for both humans and AI agents. The ToM and \textit{socialization} capabilities of LLMs from a first-person perspective, a crucial attribute for advancing embodied AI agents, remain unexplored. To answer the aforementioned questions and bridge the research gap, we introduce \textit{EgoSocialArena}, a novel framework designed to evaluate and investigate the ToM and \textit{socialization} capabilities of LLMs from a first-person perspective. It encompasses two evaluation environments: static environment and interactive environment, with seven scenarios: Daily Life, Counterfactual, New World, Blackjack, Number Guessing, and Limit Texas Hold’em, totaling 2,195 data entries. With \textit{EgoSocialArena}, we have conducted a comprehensive evaluation of nine advanced LLMs and observed some key insights regarding the future development of LLMs as well as the capabilities levels of the most advanced LLMs currently available.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13262

Loading