Keywords: Large Language Models, fair NLP, Gender bias, gender bias evaluation
TL;DR: "Do large language models still favor one gender over the other in non-stereotypical settings? We study this question in the gender pronoun setting and show that, surprisingly, 23 popular LLMs still exhibit gender bias in 60-95% of the examples"
Abstract: As large language models (LLMs) are increasingly deployed for a variety of applications, it is imperative to measure and understand how gender biases present in the training data influence model behavior.
Previous works construct benchmarks around known stereotypes (e.g., occupations) and demonstrate high levels of gender bias in LLMs, raising serious concerns about models exhibiting undesirable behaviors.
We expand on existing literature by asking the question: \textit{Do large language models still favor one gender over the other in non-stereotypical settings?}
To tackle this question, we restrict LLM evaluation to a \textit{neutral} subset, in which sentences are free of pronounced word-gender associations.
After quantifying these associations in terms of pretraining data statistics, we use them to (1) create a new benchmark and (2) adapt popular gender pronoun benchmarks -- Winobias and Winogender -- removing sentences with strongly gender-correlated words.
Surprisingly, when assessing $20+$ models in the proposed benchmarks, we still detect critically high gender bias across all tested models.
For instance, after adjusting for strong word-gender associations, we find that all models still exhibit clear gender preferences in about $60$%-$95$% of the sentences, representing a small change (up to $10$%) from the original benchmark.
Submission Number: 105
Loading