Keywords: NLP, Debiasing pre-trained langugae model, Social biases, Robustness
Abstract: Recent studies have revealed that the widely-used pre-trained language models propagate societal biases from the large unmoderated pre-training corpora. Existing solutions mostly focused on debiasing the pre-training corpora or embedding models. Thus, these approaches need a separate pre-training process and extra training datasets which are resource-intensive and costly. Indeed, studies showed that these approaches hurt the models' performance on downstream tasks. In this study, we focus on gender debiasing and propose Gender-tuning, which comprises of the two training processes: gender-word perturbation and fine-tuning. This combination aims to interrupt gender word association with other words in training examples and classifies the perturbed example according to the ground-truth label. Gender-tuning uses a joint-loss for training both the perturbation model and fine-tuning. Comprehensive experiments show that Gender-tuning effectively reduces gender biases scores in pre-trained language models and, at the same time, improves performance on downstream tasks. Gender-tuning is applicable as a plug-and-play debiasing tool for pre-trained language models. The source
code and pre-trained models will be available on the author’s GitHub page.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)
1 Reply
Loading