Enhancing implicit hate speech detection via LLM-generated adversarial samples

Lang Zhang, Hongtao Deng, Wang Gao, Yang Yu, Rui Xu

Published: 2025, Last Modified: 08 Jan 2026Int. J. Syst. Assur. Eng. Manag. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As the amount of hate speech on social media increases, the demand for its automatic detection grows more urgent. Current research primarily focuses on the detection of explicit hate speech, while the detection of more subtle and implicit forms of hate speech remains a significant challenge. These covert types of hate speech are often not easily recognized by standard classifiers due to their less obvious pragmatic and semantic features. In this paper, we propose a novel framework, adversarial implicit hate speech generator (AIHSG), which utilizes large language models to generate adversarial, implicit hate speech short text messages. These samples may not contain evident signs of hate speech on the surface but convey hateful intent through context and metaphor. The generated adversarial samples undergo preliminary manual screening to ensure that they align with the characteristics of implicit hate speech. The AIHSG-generated adversarial samples are then employed to augment the training data of supervised learning models, enhancing their performance on implicit hate speech detection tasks. Our experimental results demonstrate the effectiveness of this approach in improving model capabilities for detecting implicit hate speech.

External IDs:dblp:journals/saem/ZhangDGYX25