Debiasing Pre-trained Language Models for Gender Pronoun Resolution

Anonymous

Debiasing Pre-trained Language Models for Gender Pronoun Resolution

Anonymous

03 Sept 2022 (modified: 05 May 2023)ACL ARR 2022 September Blind SubmissionReaders: Everyone

Abstract: Leveraging pre-trained language models (PLMs) has become a universal approach for various natural language processing tasks. The models achieve good performances in general, however, they also reproduce prejudices for certain groups in the imbalanced datasets for pre-training (i.e. corpus with more male examples). In this paper, we tackle the gender biases in the Gender Pronoun Resolution (GPR) task. The PLMs have two types of gender biases: stereotype and skew. While the previous studies mainly focused on the skew problem, we aim to mitigate both gender biases in PLMs. Our methods employ two regularization terms, Stereotype Neutralization (SN) and Elastic Weight Consolidation (EWC). The models trained with the methods show to be neutralized and reduce the biases significantly on the WinoBias GPR dataset compared to the public BERT. We also invented a new gender bias quantification metric called the Stereotype Quantification (SQ) score. In addition to the metrics, embedding visualizations were used to interpret how our methods have successfully debiased the models.

Paper Type: long

0 Replies

Loading