Unmasking the Trade-off: Measuring Gender Bias Mitigation and Over-debiasing Effects in Pretrained Language Models
Abstract: Pretrained language models (PLMs) have demonstrated success across many natural language processing tasks. However, they have been shown to encode gender bias present in the corpora they are trained on. Existing bias mitigation methods are usually devised to remove all associations related to gender. This can hurt the performance of PLMs, because of a possible loss of typical associations (e.g., not associating the word ``mother'' with female). To measure the extent of loss of typical gender associations (i.e.\ over-debiasing), we introduce the Typical Associations evaluation corpus for Gender (TA-Gender). We find that three popular debiasing methods result in substantial loss of typical gender associations. Our results highlight the importance of mitigating bias without removing typical gender associations, and our dataset constitutes the first benchmark to evaluate information loss.
Paper Type: long
0 Replies
Loading