Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement

Published: 2024, Last Modified: 12 May 2025IEEE Big Data 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Content moderation on social media faces increasing challenges due to the rapid evolution of hate speech. Identifying hate speech is challenging, especially as it constantly evolves to evade detection. To address this, current methods often rely on auxiliary data like target labels, which specify the particular group targeted by hate speech, to improve detection accuracy. While these target labels can enhance model performance, they are often scarce, inconsistent across platforms, and unable to capture the full spectrum of hate speech variations. To overcome these limitations, we introduce HATE-WATCH, a novel weakly supervised framework that adapts to the fluid nature of hate speech without relying heavily on explicit target labels. By employing confidence-based reweighting and contrastive regularization, HATE-WATCH effectively disentangles input features into universal and platform-specific representations, enabling robust detection even in the absence of detailed target labels. This approach significantly advances cross-platform hate speech detection, offering a more adaptable and scalable solution that contributes to safer online communities by addressing the real-world complexities of content moderation.
Loading