Towards Automatic Online Hate Speech Intervention Generation using Pretrained Language Model

Raj Ratn Pranesh, Ambesh Shekhar, Anish Kumar

19 Oct 2020 (modified: 21 Oct 2020)OpenReview Anonymous Preprint Blind SubmissionReaders: Everyone

Abstract: Social media harbours substantial toxic and hateful conversations today. Curbing them has emerged as a critical challenge for governments and organizations globally. Prior research has primarily concentrated on the detection of online hate speech while ignoring further action needed to discourage individuals from using hate speech in the future. Counterspeech is an effective way to tackle online hate, leaving freedom of speech untouched. The focus is to directly intervene in the conversation with textual responses that counter the hate content and prevent it from further spreading. In this paper, we propose a novel natural language generation task for hate speech intervention, where the goal is to automatically generate responses to intervene during online conversations that contain hate speech. We sequentially analyzed the performance and capability of various state-of-the-art pretrained language models dialogue generation model for automated hate speech intervention system using automatic metric and manual human evaluation. The results indicate that the generated intervention responses are very promising in terms of relevance and contextual meaning

0 Replies