Keywords: Accountability and Fairness, Tabular dataset, Large Language Models
TL;DR: We distinguish between intrinsic fairness and behavioral fairness with a call to action for behavioral fairness evaluation
Abstract: Large language models (LLMs) are increasingly used to examine tabular datasets and aid decision-making in critical sectors such as clinical medicine. Standard fairness metrics, which were largely designed to evaluate supervised learning models, are not well suited to this setting.
This paper proposes a novel dichotomy between \textit{intrinsic} and \textit{behavioral} fairness, and details a comprehensive framework for evaluating both in LLMs.
The former is encoded in a language model's embeddings through procedures like pre-training, preference fine-tuning, etc.
The latter reflects the application of LLMs in real-world scenarios.
Though current works largely prioritize intrinsic over behavioral fairness, we argue that the latter is much more important in practice. We illustrate the gap between these two concepts in a series of experiments on a semi-synthetic dataset inspired by a large scale study of racial bias in health algorithms. Our results suggest a new direction for fairness research in LLMs, as well as some practical guidelines to mitigate harmful outcomes.
Submission Number: 51
Loading