Investigating the Hate–Credibility Nexus Across Datasets and Content Formats

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hate speech detection, Low Content Credibility, Content moderation, Computational social Science
Abstract: The relationship between hate speech and low-credibility content (LCC) has been widely debated, with prior work suggesting that false headlines contain more hateful language. In this work, we replicate and extend Mosleh et al.’s analysis using state-of-the-art, context-aware hate speech scoring and open-source toxicity detection, thereby improving the explanatory power sixfold (R² = 4.4% vs. 0.7%). Applying the enhanced method to the WeLFake dataset reveals a striking reversal: real news contains more hate in both short- and long-form content (β = +0.82, +0.84; p < 0.001). Further analysis reveals that this difference is driven by direct hate expression in real news, whereas LCC primarily employs reported speech, suggesting a strategic use of plausible deniability. These findings challenge assumptions about the hate–credibility nexus and highlight the need for framing-aware, cross-format moderation systems that distinguish between direct and contextual hate expression.
Submission Number: 137
Loading