Large-Scale Hate Speech Detection with Cross-Domain Transfer

Anonymous

Large-Scale Hate Speech Detection with Cross-Domain Transfer

Anonymous

16 Oct 2021 (modified: 05 May 2023)ACL ARR 2021 October Blind SubmissionReaders: Everyone

Abstract: Hate speech towards people with different backgrounds is a major problem observed in social media. Although there are various attempts to detect hate speech automatically via supervised learning models, the performance of such models simply rely on limited datasets on which models are trained. In this study, we construct large-scale tweet datasets for supervised hate speech detection in English and Turkish, including human-labeled 100k tweets per each. Our datasets are designed to have equal number of tweets distributed over five domains; namely religion, gender, race, politics, and sports. We analyze the performance of state-of-the-art language models on large-scale hate speech detection with a special focus on model scalability. We also examine cross-domain transfer ability of hate speech detection.

0 Replies

Loading