Mitigating Bias in LLMs via EquiSync: A Multi-Objective Optimization Perspective

ACL ARR 2024 June Submission4222 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The field of Natural Language Processing (NLP) has seen remarkable advancements in Large Language Models (LLMs). Despite these advancements, a persistent challenge remains: LLMs often produce biased outputs. This paper introduces EquiSync, a novel method designed to mitigate social bias in LLMs without significantly compromising their performance. EquiSync utilizes a multi-agent framework, incorporating three agents that employ a two-phase approach: Attributes Masking and Attributes Balancing. This method aligns with human values transparently and reduces disparities between social groups. Unlike traditional debiasing techniques, which often lead to performance degradation, EquiSync achieves substantial bias reduction while maintaining or even improving accuracy in downstream tasks. Our experiments demonstrate that EquiSync reduces bias scores by up to 87.7%, with only a marginal performance degradation of up to 6.8% in the BBQ dataset. Additionally, it significantly enhances the multi-objective metric icat in the stereoset dateset by up to 56.98%.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/fairness evaluation, model bias/unfairness mitigation, human-AI interaction, value-centered design, robustness, transparency,
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis, Theory
Languages Studied: English
Submission Number: 4222
Loading