SAHSD: Enhancing Hate Speech Detection in LLM-Powered Web Applications via Sentiment Analysis and Few-Shot Learning

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Security and privacy
Keywords: Security; Machine Learning; Large Language Model; Hate Speech
TL;DR: We introduce SAHSD, a method that enhances hate speech detection in LLM-powered web applications by combining sentiment analysis and few-shot learning, achieving superior accuracy with minimal training data for real-time content moderation.
Abstract: As large language models (LLMs) increasingly power web applications, including social networks, the challenge of moderating hate speech has become a critical concern for the Web. These LLM-powered applications, while offering near-human interaction capabilities, are vulnerable to harmful or biased content due to imperfect training data scraped from the Web. Current hate speech detection methods often struggle with limited annotated data, especially for real-time moderation on these platforms. This paper introduces Sentiment-Aided Hate Speech Detection (SAHSD), a novel approach designed to enhance hate speech detection specifically in LLM-powered web applications. By treating hate speech detection as a few-shot learning task, SAHSD utilizes sentiment analysis to refine pre-trained language models (LM) for improved accuracy in recognizing harmful content. SAHSD first employs publicly available sentiment datasets to train a sentiment analysis model, which is then fine-tuned by merging sentiment prompts with hate speech prompts, enabling efficient and accurate detection even with limited training samples. The effectiveness of SAHSD is demonstrated through experiments on widely used web-sourced datasets like SBIC and HateXplain. SAHSD achieves an exceptional F1-score of 0.99 with only 64 training samples and outperforms advanced techniques such as ToKen, MRP, and HARE, with significant improvements of 33% on SBIC and 95% on HateXplain. SAHSD surpasses GPT-4 in generalization performance across multiple datasets, showing an 8% improvement when trained on equal-sized samples. These results underscore SAHSD's potential to enhance content moderation in LLM-driven web platforms, contributing to a safer, more inclusive and accountable Web ecosystem.
Submission Number: 1708
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview