AntiHateAgent: A Knowledge-Augmented Evidence-Based Reasoning Agent for Implicit Hate Speech Detection

ACL ARR 2026 January Submission6080 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hate Speech Detection, Implicit Hate Speech, LLM Agents, Retrieval-Augmented Reasoning, Chain-of-Thought
Abstract: Hate speech poses a growing challenge to online platforms, particularly as it becomes increasingly implicit and subtle. While recent advances in machine learning have improved automated detection, they largely rely on static internal knowledge or explicit semantics, leading to a critical knowledge gap and a lack of comprehensive reasoning. To address these limitations, we propose AntiHateAgent, an agent-based framework for hate speech detection that enables structured reasoning, contextual knowledge integration, and transparent decision-making, thereby improving robustness and reliability in real-world content moderation scenarios. Experimental results show that AntiHateAgent significantly improves the performance of implicit hate speech detection. On three datasets, it achieves up to a 22.3\% increase in overall F1 score and up to a 43.6\% improvement in recall for hate samples. The framework excels particularly in detecting newly emerging implicit hate that relies on cultural context, reaching 89.7\% recall on the latest 4chan dataset implicit hate subset. Its evidence-driven reasoning process also ensures explainability and transparency in decision-making.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: hate speech detection
Contribution Types: Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 6080
Loading