GranulGAN: Data Augmentation for Granular Hate Speech Detection via Generative Adversarial Networks

ACL ARR 2025 May Submission6514 Authors

20 May 2025 (modified: 05 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The algorithmic detection of hate speech is an ongoing challenge in online environments. One fundamental problem is the class imbalance within labeled datasets. The diverse nature of hate speech is at the core of this imbalance problem. This work proposes GranulGAN, a novel framework designed to augment imbalanced datasets for granular hate speech detection. It utilizes a GPT-based generator, a context-based domain adaptor, and a reward system integrating multiple polarities. Furthermore, we explore the difficulty of evaluating partially generated sequences, a known limitation in training GAN for text generation, which typically require complete sequences for assessment. As an alternative, we discuss leveraging LLMs for auto-completion, enabling more effective handling of incomplete text during generation. Results from a wide range of experiments demonstrate the superiority of auto-completion by LLMs and the outperformance of GranulGAN in both binary and granular hate speech detection tasks. GranulGAN consistently achieves the highest scores in both Hate-F1 and Macro-F1, showcasing its performance on modern datasets and in comparison to multiple baseline augmentation approaches. Lastly, an ablation study is conducted to assess the importance and contribution of different polarities in the proposed reward system.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: data augmentation, adversarial training, transfer learning / domain adaptation, reinforcement learning
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Keywords: data augmentation, adversarial training, transfer learning / domain adaptation, reinforcement learning
Submission Number: 6514
Loading