Danger Depends on the Mind: A Theory-of-Mind Grounded Dataset and Model for Context-Dependent Dangerous Speech

Danger Depends on the Mind: A Theory-of-Mind Grounded Dataset and Model for Context-Dependent Dangerous Speech

ACL ARR 2026 January Submission24 Authors

21 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dangerous Speech Detection, Theory-of-Mind, Context-Dependent Risk, ToM-DS, ToMGuard

Abstract: Dangerous speech detection is a well-studied task, but existing approaches typically treat utterances in isolation, relying on binary labels that ignore who is speaking and in what mental state. We formulate a context-dependent variant of this task by grounding it in Theory-of-Mind (ToM). In cognitive science, ToM studies how humans attribute latent mental states-such as emotions, intentions, and actions-to others. We argue that such states are key signals for assessing the risk of an utterance. Building on this view, we construct ToM-DS, a 79K-instance dataset where each utterance is paired with structured speaker profiles, ToM states (emotion, intent, action), and topic hierarchies. During data construction, we first identify context-dependent sentences and generate diverse safe and dangerous scenarios surrounding them. High-quality annotations are obtained with state-of-the-art LLMs and a multi-stage cross-agent validation pipeline, yielding a comprehensive and reliable resource for context-dependent dangerous speech detection and fine-grained risk level classification. We further propose ToMGuard, a lightweight model with a dynamic ToM attention mechanism that adaptively weighs different mental-state cues. ToMGuard outperforms strong proprietary and open-source LLMs with significantly fewer parameters. Experimental results show that ToMGuard sets a new benchmark for context-dependent dangerous speech detection and risk level classification on ToM-DS.

Paper Type: Long

Research Area: Human-AI Interaction/Cooperation and Human-Centric NLP

Research Area Keywords: human-AI interaction/cooperation, human-in-the-loop, human factors in NLP

Contribution Types: Model analysis & interpretability, Data resources, Data analysis

Languages Studied: Chinese

Submission Number: 24

Loading