Unveiling the Hate: Generating Faithful and Plausible Explanations for Implicit and Subtle Hate Speech Detection

Published: 01 Jan 2024, Last Modified: 16 May 2025NLDB (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In today’s digital age, the huge amount of abusive content and hate speech on social media platforms presents a significant challenge. Natural Language Processing (NLP) methods have focused on detecting explicit forms of hate speech, often overlooking more nuanced and implicit instances. To address this gap, our paper aims to enhance the detection and understanding of implicit and subtle hate speech. More precisely, we propose a comprehensive approach combining prompt construction, free-text generation, few-shot learning, and fine-tuning to generate explanations for hate speech classification, with the goal of providing more context for content moderators to unveil the actual nature of a message on social media.
Loading