Giving Control Back to Models: Enabling Offensive Language Detection Models to Autonomously Identify and Mitigate Biases

Giving Control Back to Models: Enabling Offensive Language Detection Models to Autonomously Identify and Mitigate Biases

ACL ARR 2024 June Submission4926 Authors

16 Jun 2024 (modified: 13 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The rapid development of social media has led to an increase in online harassment and offensive speech, posing significant challenges for effective content moderation. Existing automated detection models often exhibit a bias towards predicting offensive speech based on specific vocabulary, which not only compromises model fairness but also potentially exacerbates biases against vulnerable and minority groups. Addressing these issues, this paper proposes a bias self-awareness and data self-iteration framework for mitigating model biases.This framework aims to "giving control back to models: enabling offensive language detection models to autonomously identify and mitigate biases" through bias self-awareness algorithms and self-iterative data augmentation method. Experimental results demonstrate that the proposed framework effectively reduces the false positive rate of models in both in-distribution and out-of-distribution tests, enhances model accuracy and fairness, and shows promising performance improvements in detecting offensive speech on larger-scale datasets.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: Model bias correction, offensive language detection, Bias Self-Awareness, spurious artifacts

Contribution Types: NLP engineering experiment

Languages Studied: Chinese

Submission Number: 4926

Loading