Collective Bias Mitigation via Model Routing and Collaboration

Published: 10 Jun 2025, Last Modified: 29 Jun 2025CFAgentic @ ICML'25 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bias Mitigation
TL;DR: A collective bias mitigation framework where multiple large language models collaborate to reduce bias in their responses. Preprint: yes
Abstract: Large language models (LLMs) are increasingly deployed in critical sectors such as public health, finance, and governance, necessitating both functional accuracy and societal value alignment. Despite recent advances, LLMs often perpetuate or amplify bias embedded in their training data, posing significant challenges to fairness. While self-debiasing has shown promise by encouraging an LLM to identify and correct its own biases, relying solely on the intrinsic knowledge of a single LLM may be insufficient for addressing deeply ingrained stereotypes. To address this critical limitation, we introduce Collective Bias Mitigation (CBM), a novel framework that significantly alleviates bias by learning fine-grained model behavior and fostering knowledge sharing among a diverse set of LLMs. This work is the first to systematically explore the effective selection and organization of distinct LLMs to cultivate more equitable and fair LLM responses. Extensive experiments show CBM substantially outperforms standalone baselines (e.g., Committee reduces age bias from 0.24 to 0.09). In particular, our Debating and Committee topologies achieve significant bias reduction, with the latter offering an excellent trade-off between mitigation effectiveness and inference cost, highlighting the power of CBM for fairer LLMs.
Submission Number: 47
Loading