FairShades: Fairness Auditing via Explainability in Abusive Language Detection SystemsDownload PDFOpen Website

Published: 01 Jan 2021, Last Modified: 27 Jun 2023CogMI 2021Readers: Everyone
Abstract: At every stage of a supervised learning process, harmful biases can arise and be inadvertently introduced, ul-timately leading to marginalization, discrimination, and abuse towards minorities. This phenomenon becomes particularly im-pactful in the sensitive real-world context of abusive language detection systems, where non-discrimination is difficult to assess. In addition, given the opaqueness of their internal behavior, the dynamics leading a model to a certain decision are often not clear nor accountable, and significant problems of trust could emerge. A robust value-oriented evaluation of models' fairness is therefore necessary. In this paper, we present FairShades, a model-agnostic approach for auditing the outcomes of abusive language detection systems. Combining explainability and fairness evaluation, Fair-Shades can identify unintended biases and sensitive categories towards which models are most discriminative. This objective is pursued through the auditing of meaningful counterfactuals generated within CheckList framework. We conduct several ex-periments on BERT-based models to demonstrate our proposal's novelty and effectiveness for unmasking biases.
0 Replies

Loading