Collective Bias Mitigation via Model Routing and Collaboration

ACL ARR 2025 May Submission1167 Authors

16 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) are increasingly deployed in critical sectors such as public health, finance, and governance, necessitating both functional accuracy and societal value alignment. Despite recent advances, LLMs often perpetuate or amplify bias embedded in their training data, posing significant challenges to fairness. While self-debiasing has shown promise by encouraging an LLM to identify and correct its own biases, relying solely on the intrinsic knowledge of a single LLM may be insufficient for addressing deeply ingrained stereotypes. To address this critical limitation, we introduce Collective Bias Mitigation (CBM), a novel framework that significantly alleviates bias by learning fine-grained model behavior and fostering knowledge sharing among a diverse set of LLMs. This work is the first to systematically explore the effective selection and organization of distinct LLMs to cultivate more equitable and fair LLM responses. Extensive experiments show CBM substantially outperforms standalone baselines (e.g., Committee reduces age bias from 0.24 to 0.09). In particular, our Debating and Committee topologies achieve significant bias reduction, with the latter offering an excellent trade-off between mitigation effectiveness and inference cost, highlighting the power of CBM for fairer LLMs.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: Bias, Fairness
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 1167
Loading