BMIL: Self and Cooperative Bias Mitigation in-the-loop in Large Language Models

ACL ARR 2024 June Submission17 Authors

04 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent progress in Large Language Models (LLMs) has demonstrated their superior abilities in various Natural Language Processing (NLP) tasks. However, they have also revealed a tendency to learn and unintentionally magnify harmful societal biases. Current bias mitigation methods during the pre-processing and training stages still leave considerable methodological challenges. We propose a novel multi-stage bias mitigation approach called 'Bias Mitigation in-the-loop' (BMIL), which consists of two main strategies: self bias mitigation in-the-loop and cooperative bias mitigation in-the-loop. The first strategy enables LLMs to autonomously assess and reduce their biases, while the second involves collaboration among multiple LLMs with varying bias levels to collectively tackle and reduce various biases through a debate process. Furthermore, we apply these strategies in supervised fine-tuning sessions to alleviate inherent biases in LLMs. Our experiments, involving models like ChatGPT, Gemini, Llama2, Llama3, and Mistral, demonstrate that BMIL effectively mitigates a broad spectrum of biases, significantly improving the quality of model outputs.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/unfairness mitigation,model bias/fairness evaluation
Languages Studied: English
Submission Number: 17
Loading