Wait... Was That a Sign? Reading Minds Through Actions: Observable Theory of Mind with Nonverbal Cues

ACL ARR 2025 February Submission3495 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Our ability to interpret others' mental state with nonverbal cues (NVC) has been fundamental to our survival and social cohesion. While existing Theory of Mind (ToM) benchmarks have primarily focused on false-belief tasks and reasoning with asymmetric information, they overlook other mental states beyond belief and the rich tapestry of human nonverbal communication. We present ObservableToM, a comprehensive framework for evaluating the ToM capabilities of machines in interpreting NVCs. Starting from an FBI agent's validated profile handbook, we develop OToMtext, a dialogue dataset of 9,896 entries with diverse context, and OToMvideo, a carefully curated video dataset with fine-grained annotations of actions with psychological interpretations. Our evaluation reveals that current AI systems struggle significantly with NVC interpretation, showing not only a substantial performance gap (GPT-4o: 73.6% vs. human: 91.5%) but also patterns of over-interpretation, with particularly low precision (40.0-63.5%) indicating high false alarm rates.
Paper Type: Long
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: Multimodality and Language Grounding to Vision, Robotics and Beyond, Dialogue and Interactive Systems
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 3495
Loading