Offensive Text Detection Across Languages and Datasets Using Rule-based and Hybrid Methods


16 Nov 2021 ACL ARR 2021 November
Abstract: We investigate the potential of rule-based systems for the task of offensive text detection in English and German, and demonstrate their effectiveness in low-resource settings, as an alternative or addition to transfer learning across tasks and languages. Task definitions and annotation guidelines used by existing datasets show great variety, hence state-of-the-art machine learning models do not transfer well across datasets or languages. Furthermore, such systems lack explainability and pose a critical risk of unintended bias. We present simple rule systems based on semantic graphs for classifying offensive text in two languages and provide both quantiative and qualitative comparison of their performance with deep learning models on 5 datasets across multiple languages and shared tasks.
