Do Large Language Models Simulate Minds? A Representation Analysis of Theory of Mind

Do Large Language Models Simulate Minds? A Representation Analysis of Theory of Mind

ACL ARR 2025 February Submission3484 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Theory of Mind (ToM) is the ability to understand others' mental states, which is essential for human social interaction. Although recent studies suggest that large language models (LLMs) exhibit human-level ToM capabilities, the underlying mechanisms remain unclear. "Simulation Theory" posits that we infer others' mental states by simulating their cognitive processes, which has been widely discussed in cognitive science. In this work, we propose a framework for investigating whether the ToM mechanism in LLMs is based on Simulation Theory by analyzing their internal representations. Following this framework, we successfully controlled LLMs' ToM reasoning through modeled perspective-taking and counterfactual interventions. Our results provide initial evidence that state-of-the-art LLMs implement an emergent ToM partially based on Simulation Theory, suggesting parallels between human and artificial social reasoning.

Paper Type: Short

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: interpretability, cognitive modeling, computational psycholinguistics, probing, counterfactual/contrastive explanations

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 3484

Loading