Absorbing Commonsense Knowledge from LLMs for Improving Social Fairness in Pre-trained Language Models

Anonymous

Absorbing Commonsense Knowledge from LLMs for Improving Social Fairness in Pre-trained Language Models

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Pre-trained Language models (PLMs) are trained on inherently socially biased sources, inevitably causing undesirable application impacts. Current debiasing paradigm involves identifying bias from external corpora, which have limited quality, diversity, or equivalence among different groups, potentially impacting bias location and debiasing effectiveness. In light of this, we advance fairness in PLMs by absorbing coherent, balanced, and semantically informative social \underline{Commonsense \underline{K}nowledge (CK-Debias) automatically generated from large language models (LLMs). Our study addresses the demographic CK generation from LLM and explores strategies to optimize CK utilization. This is achieved by employing causal analysis to align knowledge for estimating bias space and identifying the most biased prompts to enhance bias avoidance capability. Experiment results on public datasets and intrinsic and extrinsic metrics show that \M~can significantly reduce multiple social biases across various PLMs while keeping their language expressiveness intact.

Paper Type: long

Research Area: Ethics, Bias, and Fairness

Contribution Types: Model analysis & interpretability

Languages Studied: English

0 Replies

Loading