Aligning Implied Statements for Implicit Hate Speech Generalizability with Context-Bounded Semi-hard Negative Mining

ACL ARR 2026 January Submission8371 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: NLP Applications, Computational Social Science and Cultural Analytics, Ethics, Bias, and Fairness, Interpretability and Analysis of Models for NLP
Abstract: Classifying implicit hate speech remains a challenge, intent is often masked through insinuation and context rather than explicit slurs. Prior supervised contrastive approaches improve in-domain detection but can overfit surface cues and struggle to transfer across datasets. We propose \textsc{ImpSH}, a triplet-based framework that aligns posts with implied statements when available and uses context-bounded semi-hard negatives to focus learning on near confusions. We also examine \textsc{AugSH}, which forms positives via data augmentation. In controlled evaluations on \textsc{IHC}, \textsc{SBIC}, and \textsc{DynaHate} with \textsc{BERT} and \textsc{HateBERT}, \textsc{ImpSH} is a viable alternative to standard supervised contrastive baselines and often improves cross-domain performance under matched preprocessing and tuning budgets. Representation analysis using alignment and uniformity indicates tighter positive pairs with balanced global spread, and qualitative nearest-neighbor case studies illustrate typical false negatives under domain shift. These results demonstrate that aligning posts with their implied statements via context-bounded mining provides a more stable, bijective-like mapping to related insinuations, overcoming the volatility inherent in traditional clustering-based representation learning.\footnote{Code will be released under the MIT license upon acceptance.}
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: NLP Applications, Computational Social Science and Cultural Analytics, Ethics, Bias, and Fairness, Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 8371
Loading