Unmasking the Trade-off: Measuring Gender Bias Mitigation and Over-debiasing Effects in Pretrained Language Models

Anonymous

Unmasking the Trade-off: Measuring Gender Bias Mitigation and Over-debiasing Effects in Pretrained Language Models

Anonymous

16 Oct 2021 (modified: 05 May 2023)ACL ARR 2021 October Blind SubmissionReaders: Everyone

Abstract: Pretrained language models (PLMs) have demonstrated success across many natural language processing tasks. However, evidence suggests that they encode gender bias present in the corpora they are trained on. Existing bias mitigation methods are usually devised to remove all associations related to gender. This can hurt the performance of PLMs, because it can lead to a loss of genuine and factual associations (e.g., not associating the word "mother" with females over males). To measure the extent of undesirable loss of gender associations (i.e. over-debiasing), we introduce the Desirable Associations evaluation corpus for Gender (DA-Gender). We find that three popular debiasing methods result in substantial undesirable loss of gender associations. Our results highlight the importance of mitigating bias without removing genuine gender association, and our dataset constitutes the first benchmark to evaluate over-debiasing.

0 Replies

Loading