Danger Depends on the Mind: A Theory-of-Mind Grounded Dataset and Model for Context-Dependent Dangerous Speech
Keywords: Dangerous Speech Detection, Theory-of-Mind, Context-Dependent Risk, ToM-DS, ToMGuard
Abstract: Dangerous speech detection is a well-studied task, but existing approaches typically treat utterances in isolation, relying on binary labels that ignore who is speaking and in what mental state. We formulate a context-dependent variant of this task by grounding it in Theory-of-Mind (ToM). In cognitive science, ToM studies how humans attribute latent mental states-such as emotions, intentions, and actions-to others. We argue that such states are key signals for assessing the risk of an utterance.
Building on this view, we construct ToM-DS, a 79K-instance dataset where each utterance is paired with structured speaker profiles, ToM states (emotion, intent, action), and topic hierarchies. During data construction, we first identify context-dependent sentences and generate diverse safe and dangerous scenarios surrounding them. High-quality annotations are obtained with state-of-the-art LLMs and a multi-stage cross-agent validation pipeline, yielding a comprehensive and reliable resource for context-dependent dangerous speech detection and fine-grained risk level classification.
We further propose ToMGuard, a lightweight model with a dynamic ToM attention mechanism that adaptively weighs different mental-state cues. ToMGuard outperforms strong proprietary and open-source LLMs with significantly fewer parameters. Experimental results show that ToMGuard sets a new benchmark for context-dependent dangerous speech detection and risk level classification on ToM-DS.
Paper Type: Long
Research Area: Human-AI Interaction/Cooperation and Human-Centric NLP
Research Area Keywords: human-AI interaction/cooperation, human-in-the-loop, human factors in NLP
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: Chinese
Submission Number: 24
Loading