Abstract: Pretrained Language Models (PLMs) are widely used in NLP for various tasks. Recent studies have identified various biases that such models exhibit and have proposed methods to correct these biases. However, most of the works address a limited set of bias dimensions independently such as gender, race, or religion. Moreover, the methods typically involve finetuning the full model in order to maintain the performance on the downstream task. In this work, we aim to modularly debias a pre-trained language model across multiple dimensions. Previous works extensively explored debiasing PLMs by using limited US-centric counterfactual data augmentation (CDA). We use structured knowledge and a large generative model to build a diverse CDA across multiple bias dimensions in a semi-automated way. We highlight how existing debiasing methods do not consider interactions between multiple societal biases and propose a debiasing model that exploits the synergy amongst various societal biases and enables multi-bias debiasing simultaneously. An extensive evaluation on multiple tasks and languages demonstrates the efficacy of the approach.
Paper Type: long
Research Area: Ethics, Bias, and Fairness
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English (En), French (Fr), Italian (It), Hindi (Hi), Tamil (Ta), Marathi (Mr), Swahili (Sw), Gujarati (Gu)
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading