Abstract: Large language models (LLMs) are increasingly deployed in critical sectors such as public health, finance, and governance, necessitating not only functional accuracy but also alignment with societal values. Despite recent advances, LLMs often propagate or amplify bias embedded in their training data, posing significant challenges to fairness. While self-debiasing has shown promise by encouraging an LLM to identify and correct its own biases, relying solely on the intrinsic knowledge of a single LLM may be insufficient for addressing deeply ingrained stereotypes. To overcome this limitation, we propose a novel collective bias mitigation (CBM) framework that alleviates bias through knowledge sharing among diverse LLMs. Our work is the first to explore how effectively selecting and organizing distinct LLMs to foster more equitable LLM responses. Extensive experiments demonstrate that CBM consistently outperforms the standalone baseline in mitigating biased LLM responses.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: Bias, Fairness
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 502
Loading