Beyond English: Offensive Language Detection in Low-Resource Nigerian Languages

Published: 03 Mar 2024, Last Modified: 11 Apr 2024AfricaNLP 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Natural Language Processing, African languages, Offensive language detection
TL;DR: Offensive language detection in the three major Nigerian languages
Abstract: The proliferation of online offensive language necessitates the development of ef- fective detection mechanisms, especially in multilingual contexts. This study ad- dresses the challenge by developing and introducing novel datasets for hate speech detection in three major Nigerian languages: Hausa, Yoruba, and Igbo. We col- lected data from Twitter and manually annotated it to create datasets for each of the three languages, using native speakers. We used pre-trained language models to evaluate their efficacy in detecting offensive language in our datasets. The best- performing model achieved an accuracy of 90%. To further support research in offensive language detection, we plan to make the dataset and our model publicly available.
Submission Number: 51
Loading