Keywords: fairness, uncertainty quantification, intersectionality, stereotypes, social psychology, cognitive psychology
TL;DR: We propose a fairness benchmark to evaluate intersectional stereotypes in LLMs through the lens of uncertainty and demonstrate that LLMs can be used as a tool to promote the study of stereotypes in social psychology
Abstract: Recent work has shown that Large Language Models (LLMs) learn and reproduce pre-existing biases in their training corpora, such as preferences for socially privileged identities (e.g., men or White people) and prejudices against socially marginalized identities (e.g., women or Black people). Current evaluations largely focus on single-attribute discrimination (e.g., gender stereotypes). By contrast, we investigate intersectional stereotypical bias (e.g., against Black women) as these social groups face unique challenges that cannot be explained by any single aspect of their identity alone. Our contributions in this work are two-fold: First, we design and release a new fairness benchmark for intersectional stereotypes in LLMs by augmenting the WinoBias corpus using 25 demographic markers including gender identity, body type, and disability.
We use this benchmark to evaluate the fairness of five causal LLMs through the lens of uncertainty, and find that they are disparately uncertain for intersectional identities on the pronoun-occupation co-reference resolution task, indicating systematic intersectional stereotypical bias. Second, we build on cognitive psychology research on stereotypes in human society, by using LLMs to detect stereotypes against intersectional identities that have previously not been studied in the social sciences. Drawing from the seminal warmth-competence stereotype content model, we compare stereotypes in LLMs to stereotypes produced by human annotators and report statistically significant alignment between the two. Our findings underscore the potential for LLMs to be used to conduct social psychology research that could otherwise be harmful to conduct with human subjects.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7597
Loading