Hate Speech Detection in Somali-English Code-Switched Texts

Published: 2025, Last Modified: 21 Jan 2026NLPCC (3) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The use of large language models (LLMs) have grown significantly worldwide, offering numerous benefits but also posing risks of misuse. For example, LLMs can generate harmful content, such as hate speech targeting specific individuals or groups. Although recent research has begun addressing the detection of LLM-generated hate speech, low-resource languages remain markedly underrepresented. As LLMs are increasingly adopted by culturally and linguistically diverse communities, it is essential to evaluate their impact across all languages they are trained on, including those with limited resources. Code-switching, the practice of alternating between two or more languages within a single piece of content, presents unique challenges for automated hate speech detection. This study investigates the capability of LLMs to detect hate speech in Somali-English code-switched texts and introduces an evaluation framework that integrates local linguistic knowledge. We employ in-context few-shot learning and retrieval-augmented generation to enhance detection performance. Moreover, we develop a high-quality benchmark dataset consisting of 3,012 Somali-English code-switched texts containing explicit hate speech. Our findings reveal that while LLMs perform well in detecting hate speech in English segments, they struggle with the Somali segments, especially when the English portion expresses strong positive sentiment. Our proposed linguistic adjustments and strategies significantly enhance LLM performance in these multilingual and code-switched contexts. Our code and dataset are available at:https://github.com/Abdisalam-Badel/SECSHSD
Loading