First-Person Fairness in Chatbots

Published: 22 Jan 2025, Last Modified: 11 Feb 2025ICLR 2025 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: fairness, large language models, chatbots
TL;DR: A methodology for evaluating bias in open-ended chat
Abstract: Some chatbots have access to a user’s name when responding. Prior work has shown that large language model outputs can change based on the demographic traits correlated with a name, such as gender or race. In this study, we introduce a scalable method for studying one form of first-person fairness—fairness towards the user based on their demographic information— across a large and heterogeneous corpus of actual chats. We leverage a language model as an AI “research assistant” (AI RA) that can privately and scalably analyze chat data, surfacing broader trends without exposing specific examples to the researchers. We corroborate the labels of the AI RA with independent human annotations, finding it highly consistent with human ratings of gender bias (less so for racial bias). We apply this methodology to a large set of chats with a commercial chatbot. We assess overall quality of responses conditional on different names and also subtle differences in similar-quality responses that may in aggregate reinforce harmful stereotypes based on gender or race. The largest detected biases are gender biases in older generations of models and in open-ended tasks, like writing a story. Finally, evaluations like ours are important for monitoring and reducing biases.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8413
Loading