Multilingual offensive lexicon annotated with contextual information

Anonymous

Multilingual offensive lexicon annotated with contextual information

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone

Abstract: Online hate speech and offensive comments detection is not a trivial research problem since pragmatic (contextual) factors influence what is considered offensive. Moreover, offensive terms are hardly found in classical lexical resources such as wordnets, sentiment, and emotion lexicons. In this paper, we embrace the challenges and opportunities of the area and introduce the first multilingual offensive lexicon (MOL), which is composed of 1,000 explicit and implicit pejorative terms and expressions annotated with contextual information. The terms and expressions were manually extracted by a specialist from Instagram abusive comments originally written in Portuguese and manually translated by American English, Latin American Spanish, African French, and German native speakers. Each expression was annotated by three different annotators, producing high human inter-annotator agreement. Accordingly, this resource provides a new perspective to explore abusive language detection.

0 Replies

Loading