Abstract: The algorithmic detection of hate speech is an ongoing challenge in online environments. One fundamental problem is the class imbalance within labeled datasets. The diverse nature of hate speech is at the core of this imbalance problem. This work proposes GranuGAN, a novel framework designed to augment imbalanced datasets for granular hate speech classification.
GranuGAN utilizes a GPT-2-based generator, a context-based domain adaptor, and a reward system for integrating multiple polarities. Additionally, an alternative solution for handling partial sequences via LLMs' auto-completion is discussed. A wide range of experiments verify the efficacy of LLMs' auto-completion in handling partial sequences and evaluate GranuGAN on both binary and multi-class hate speech detection tasks. Results demonstrate the superiority of auto-completion by LLMs and the outperformance of GranuGAN in binary hate speech detection tasks. GranuGAN consistently achieves the highest scores in both Hate-F1 and Macro-F1, showcasing its performance on modern datasets and in comparison to multiple baseline augmentation approaches. An ablation study is conducted to assess the contribution of different polarities in the proposed reward system, and a case study illustrates quality of the generated hatred texts.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: data augmentation, adversarial training, transfer learning / domain adaptation, reinforcement learning
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 4583
Loading