Euphemistic Abuse - A New Dataset and Classification Experiments for Implicitly Abusive Language

Michael Wiegand, Jana Kampfmeier, Elisabeth Eder, Josef Ruppenhofer

Published: 01 Jan 2023, Last Modified: 12 Jun 2024EMNLP 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We address the task of identifying euphemistic abuse (e.g. “You inspire me to fall asleep”) paraphrasing simple explicitly abusive utterances (e.g. “You are boring”). For this task, we introduce a novel dataset that has been created via crowdsourcing. Special attention has been paid to the generation of appropriate negative (non-abusive) data. We report on classification experiments showing that classifiers trained on previous datasets are less capable of detecting such abuse. Best automatic results are obtained by a classifier that augments training data from our new dataset with automatically-generated GPT-3 completions. We also present a classifier that combines a few manually extracted features that exemplify the major linguistic phenomena constituting euphemistic abuse.