Abstract: Though large language models (LLMs) like generative pre-trained Transformers (GPTs) have achieved superior performance over many tasks, they capture and propagate social biases and stereotypes that are present in the training data. In this paper, we propose a framework that reformulates the bias detection of LLMs as a hypothesis testing problem with the null $H_0$ denoting {\em no bias}. Our frameworkis designed for contrastive text pairs, and it has two schemes: one is based on (log-)likelihood and another is based on preference. To this end, two public dataset CrowS-Pairs and its French version are utilized, both including nine categories of bias. Although frequentist methods such as Student's $t$ and Wilcoxon test can be employed in our framework, Bayesian test (Bayes factors) is preferred for bias detection as it allows practitioners to quantify the evidence for both two competing hypotheses. Our framework is suitable for a wide range of large language models, and we demonstrate its application to the popular GPT-3 (text-davinci-003) and ChatGPT (GPT-3.5-Turbo) in the experiments. From our experiments, the bias behavior of ChatGPT is mostly consistent on both the English and French CrowS-pairs datasets, but still exhibits some differences due to different social norms.
Paper Type: long
Research Area: Ethics, Bias, and Fairness
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English; French
0 Replies
Loading