Abstract: Pretrained language models (PLMs), such as GPT-2, have achieved remarkable empirical perfor-mance in text generation tasks. However, pre-trained on large-scale natural language corpora,the generated text from PLMs may exhibitsocialbiasagainst disadvantaged demographic groups.To improve the fairness of PLMs in text genera-tion, we propose to minimize the mutual informa-tion between the semantics in the generated textsentences and theirdemographic polarity,i.e.,the demographic group to which the sentence isreferring. In this way, the mentioning of a demo-graphic group (e.g.,male or female) is encouragedto be independent from how it is described in thegenerated text, thus effectively alleviating the so-cial bias. Moreover, we propose to efficientlyestimate the upper bound of the above mutual in-formation via importance sampling, leveraging anatural language corpus. We also propose a dis-tillation mechanism that preserves the languagemodeling ability of the PLMs after debiasing. Em-pirical results on real-world benchmarks demon-strate that the proposed method yields superiorperformance in term of both fairness and languagemodeling ability.
0 Replies
Loading