BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

ACL ARR 2024 June Submission302 Authors

09 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Evaluating the bias of LLMs becomes more crucial with their rapid development. However, existing evaluation approaches rely on fixed-form outputs and cannot adapt to the flexible open-text generation scenarios of LLMs (e.g., sentence completion and question answering). To address this, we introduce BiasAlert, a plug-and-play tool designed to detect social bias in open-text generations of LLMs. BiasAlert integrates external human knowledge with its inherent reasoning capabilities to detect bias reliably. Extensive experiments demonstrate that BiasAlert significantly outperforms existing state-of-the-art methods like GPT-4-as-Judge in detecting bias. Furthermore, through application studies, we showcase the utility of BiasAlert in reliable LLM fairness evaluation and bias mitigation across various scenarios. Model and code will be publicly released.

Paper Type: Short

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: fairness

Contribution Types: Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 302

Loading