Reading the Air: Evaluating Field Intelligence of LLMs in Social Dynamics

ACL ARR 2026 January Submission1610 Authors

30 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Social Intelligence, Large Language Models, Theory of Mind
Abstract: Socially embedded LLM agents must not only interpret what is said, but also infer latent motives, track group-level atmosphere, and choose actions that remain normatively appropriate under uncertainty. We present GroupMind, a benchmark for evaluating Field Intelligence via a progressive three stage chain: Subtext Deciphering, Atmosphere Recognition, and Social Appropriateness. GroupMind contains 3,084 multi-turn, high-tension social interactions spanning seven scenario families, constructed with a sociology simulation pipeline that instantiates interaction topologies and applies LLM-assisted generation with consensus and human verification. We evaluate models under controlled factors of information visibility and conversational noise, and introduce Holistic Social Success Rate (HSR) to measure end-to-end reliability across the full cognition-to-action loop. Experiments on 20 LLMs reveal a consistent knowledge–action gap: strong subtask accuracy does not reliably translate into socially appropriate decisions, with the best model achieving only 70.0% HSR in the easiest setting and dropping to 55.2% under combined constraints. Code and data are available at https://anonymous.4open.science/r/Groupmind-EA56.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: evaluation and metrics,task-oriented,dialogue state tracking
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English, Chinese
Submission Number: 1610
Loading