TL;DR: We introduce an uncertainty-aware fairness metric, alongside a new gender-occupation dataset to provide a comprehensive evaluation framework for the fairness of large language models.
Abstract: The recent rapid adoption of large language models (LLMs) highlights the critical need for benchmarking their fairness. Conventional fairness metrics, which focus on discrete accuracy-based evaluations (i.e., prediction correctness), fail to capture the implicit impact of model uncertainty (e.g., higher model confidence about one group over another despite similar accuracy). To address this limitation, we propose an uncertainty-aware fairness metric, UCerf, to enable a fine-grained evaluation of model fairness that is more reflective of the internal bias in model decisions. Furthermore, observing data size, diversity, and clarity issues in current datasets, we introduce a new gender-occupation fairness evaluation dataset with 31,756 samples for co-reference resolution, offering a more diverse and suitable benchmark for modern LLMs. Combining our metric and dataset, we provide insightful comparisons of eight open-source LLMs. For example, Mistral-8B exhibits suboptimal fairness due to high confidence in incorrect predictions, a detail overlooked by Equalized Odds but captured by UCerF. Overall, this work provides a holistic framework for LLM evaluation by jointly assessing fairness and uncertainty, enabling the development of more transparent and accountable AI systems.
Lay Summary: As large language models (LLMs) such as ChatGPT become more broadly deployed and increasingly impact many aspects of the society, any fairness issue of LLMs, e.g., gender bias, becomes an urgent problem which leads to profound harms at large scale. Existing fairness evaluation methods are based on the output results from LLMs. While this is indeed helpful, the fairness evaluation will be more accurate and informative if we also incorporate LLMs' confidence about their output, which is what the proposed UCerF metric captures. In addition, we also offer a new dataset SynthBias that is larger in scale and more suitable for recent LLMs than the precedent dataset WinoBias.
UCerF integrates the confidence of LLMs (in another word, estimated model uncertainty) into fairness evaluation by introducing a continuous scale of model behavior preference. From left to right, this scale captures confidently incorrect behavior, unconfident incorrect behavior, unconfident correct behavior, and confidently correct behavior. A model's fairness is captured by its behavior difference on this scale when facing stereotypical and anti-stereotypical scenarios.
With UCerF and SynthBias, we reveals that model confidence (uncertainty) actually impacts fairness evaluation significantly, i.e., some seemingly fair model is in fact not as fair because it is very confident when giving biased output. With the new metric and dataset, we offer a better way to ensure LLMs is fair before deploying and causing harm.
Link To Code: https://github.com/apple/ml-synthbias
Primary Area: Social Aspects->Fairness
Keywords: Fairness, Uncertainty, Large Language Models
Submission Number: 2669
Loading