Abstract: Unwanted social biases are usually encoded in pretrained language models (PLMs). Recent efforts are devoted to mitigating intrinsic bias encoded in PLMs. However, the separate fine-tuning on applications is detrimental to intrinsic debiasing. A bias resurgence issue arises when fine-tuning the debiased PLMs on downstream tasks. To eliminate undesired stereotyped associations in PLMs during fine-tuning, we present a mixup-based framework Mix-Debias from a new unified perspective, which directly combines debiasing PLMs with fine-tuning applications. The key to Mix-Debias is applying mixup-based linear interpolation on counterfactually augmented downstream datasets, with expanded pairs from external corpora. Besides, we devised an alignment regularizer to ensure original augmented pairs and gender-balanced counterparts are spatially closer. Experimental results show that Mix-Debias can reduce biases in PLMs while maintaining a promising performance in applications.
0 Replies
Loading