Detecting Unintended Social Bias in Toxic Language DatasetsDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Hate speech and offensive texts are examples of damaging online content that target or promote hatred towards a group or individual member based on their actual or perceived features of identification, such as race, religion, or sexual orientation. Sharing violent and offensive content has had a significant negative impact on society. These hate speech and offensive content generally contains societal biases in them. With the rise of online hate speech, automatic detection of such biases as a natural language processing task is getting popular. However, not much research has been done to detect unintended social bias from toxic language datasets. In this paper, we introduce a new dataset from an existing toxic language dataset, to detect social biases along with their categories and targeted groups. We then report baseline performances of both classification and generation tasks on our curated dataset using transformer-based models. Our study motivates a systematic extraction of social bias data from toxic language data.
Paper Type: short
0 Replies

Loading