Absorbing Commonsense Knowledge from LLMs for Improving Social Fairness in Pre-trained Language ModelsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Pre-trained Language models (PLMs) are trained on inherently socially biased sources, inevitably causing undesirable application impacts. Current debiasing paradigm involves identifying bias from external corpora, which have limited quality, diversity, or equivalence among different groups, potentially impacting bias location and debiasing effectiveness. In light of this, we advance fairness in PLMs by absorbing coherent, balanced, and semantically informative social \underline{Commonsense \underline{K}nowledge (CK-Debias) automatically generated from large language models (LLMs). Our study addresses the demographic CK generation from LLM and explores strategies to optimize CK utilization. This is achieved by employing causal analysis to align knowledge for estimating bias space and identifying the most biased prompts to enhance bias avoidance capability. Experiment results on public datasets and intrinsic and extrinsic metrics show that \M~can significantly reduce multiple social biases across various PLMs while keeping their language expressiveness intact.
Paper Type: long
Research Area: Ethics, Bias, and Fairness
Contribution Types: Model analysis & interpretability
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview