Mitigation and Evaluation for Gender Stereotype under Unfair Escape

Mitigation and Evaluation for Gender Stereotype under Unfair Escape

ACL ARR 2024 June Submission3783 Authors

16 Jun 2024 (modified: 05 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Gender bias and stereotypes have long been concerns in language models. The training data for the model is derived from social products, which inevitably introduces potential unfairness. Existing datasets and methods lack concerns on diversified insight and mitigation efficiency. Based on the above issues, we propose an integrated and closed-loop framework for the data construction, mitigation method and evaluation for this task. Through this framework, we develop a diversified generative evaluation dataset that encompasses various perspectives on gender prejudice and the unfair escape ability LLMs possess. Further, we propose balanced prompting to effectively alleviate the inherent bias of the model. To evaluate the unbiased capability of the LLMs, we introduce the opinion consistency evaluation method. We demonstrate the effectiveness of the proposed framework through extensive experiments. Our code and datasets will release in \href{https://anonymous.4open.science/r/Bias_dataset-8565}{https://anonymous.4open.science/r/Bias\_dataset-8565}.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: model bias/fairness evaluation; model bias/unfairness mitigation

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English, Chinese

Submission Number: 3783

Loading