Collective Bias Mitigation via Model Routing and Collaboration

Collective Bias Mitigation via Model Routing and Collaboration

ACL ARR 2025 February Submission502 Authors

08 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) are increasingly deployed in critical sectors such as public health, finance, and governance, necessitating not only functional accuracy but also alignment with societal values. Despite recent advances, LLMs often propagate or amplify bias embedded in their training data, posing significant challenges to fairness. While self-debiasing has shown promise by encouraging an LLM to identify and correct its own biases, relying solely on the intrinsic knowledge of a single LLM may be insufficient for addressing deeply ingrained stereotypes. To overcome this limitation, we propose a novel collective bias mitigation (CBM) framework that alleviates bias through knowledge sharing among diverse LLMs. Our work is the first to explore how effectively selecting and organizing distinct LLMs to foster more equitable LLM responses. Extensive experiments demonstrate that CBM consistently outperforms the standalone baseline in mitigating biased LLM responses.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: Bias, Fairness

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 502

Loading