CADA: A Counterfactual Adversarial Data Augmentation Framework for Low-Resource Hate Speech Detection

Bo Zhang, Junyu Lu, Liang Yang, Bo Xu, Hongfei Lin

Published: 2025, Last Modified: 21 Jan 2026NLPCC (3) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Detecting hate speech in low-resource languages presents a significant challenge. Existing research typically employs data augmentation to alleviate the scarcity of annotated data. However, these methods fail to generate diverse and high-quality samples. In this paper, we propose a novel Counterfactual Adversarial Data Augmentation Framework (CADA) to detect hate speech in low-resource languages, consisting of three key components: generator, discriminator, and classifier. Specifically, the generator first introduces a counterfactual strategy to generate diverse augmented samples using large language models. The discriminator, enhanced with adversarial calibration, then verifies the quality of the generated data to improve the reliability of the augmentation process. The classifier utilizes the augmented data to enhance the understanding and detection of hate speech. We conduct extensive experiments on hate speech datasets across 12 low-resource languages. The results demonstrate that the proposed CADA outperforms existing state-of-the-art methods. Ablation studies further confirm the effectiveness of each component in the framework.

External IDs:dblp:conf/nlpcc/ZhangLYXL25