Abstract: Innovative transformer-based language models produce contextually-aware token embeddings and have achieved state-of-the-art performance for a variety of natural language tasks, but have been shown to encode unwanted biases for downstream applications. In this paper, we evaluate the social biases encoded by transformers trained with the masked language modeling objective using proposed proxy functions within an iterative masking experiment to measure the quality of transformer models’ predictions, and assess the preference of MLMs towards disadvantaged and advantaged groups. We compare bias estimations with those produced by other evaluation methods using benchmark datasets and assess their alignment with human annotated biases. We find relatively high religious and disability biases across considered MLMs and low gender bias in one dataset relative to another. We extend on previous work by evaluating social biases introduced after re-training an MLM under the masked language modeling objective, and find that proposed measures produce more accurate estimations of biases introduced by re-training MLMs than others based on relative preference for biased sentences between models.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/fairness evaluation, model bias/unfairness mitigation, ethical considerations in NLP applications, reflections and critiques
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 12
Loading