Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

Published: 16 Jan 2024, Last Modified: 08 Apr 2024ICLR 2024 posterEveryoneRevisionsBibTeX
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Bias, Fairness, LLM, Reasoning, Persona, Safety
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Assigning personas to LLMs can bring their deep-rooted biases to the surface, significantly diminishing their reasoning ability across domains.
Abstract: Recent works have showcased the ability of large-scale language models (LLMs) to embody diverse personas in their responses, exemplified by prompts like ‘_You are Yoda. Explain the Theory of Relativity._’ While this ability allows personalization of LLMs and enables human behavior simulation, its effect on LLMs’ capabilities remains unclear. To fill this gap, we present the first extensive study of the unintended side-effects of persona assignment on the ability of LLMs to perform _basic reasoning tasks_. Our study covers 24 reasoning datasets (spanning mathematics, law, medicine, morals, and more), 4 LLMs (2 versions of ChatGPT-3.5, GPT-4-Turbo, and Llama-2-70b-chat), and 19 diverse personas (e.g., ‘an Asian person’) spanning 5 socio-demographic groups: race, gender, religion, disability, and political affiliation. Our experiments unveil that LLMs harbor deep rooted bias against various socio-demographics underneath a veneer of fairness. While they overtly reject stereotypes when explicitly asked (‘_Are Black people less skilled at mathematics?_’), they manifest stereotypical and often erroneous presumptions when prompted to answer questions while adopting a persona. These can be observed as abstentions in the model’s response, e.g., ‘_As a Black person, I am unable to answer this question as it requires math knowledge_’, and generally result in a substantial drop in performance on reasoning tasks. Our experiments with ChatGPT-3.5 show that this bias is _ubiquitous_—80% of our personas demonstrate bias; it is _significant_—some datasets show performance drops of 70%+; and can be especially _harmful for certain groups_—some personas suffer statistically significant drops on 80%+ of the datasets. Overall, all four LLMs exhibit persona-induced bias to varying extents, with GPT-4-Turbo showing the least but still a problematic amount of bias (evident in 42% of the personas). Further analysis shows that these persona-induced errors can be hard-to-discern as they do not always manifest as explicit abstentions, and can also be hard-to-avoid—we find de-biasing prompts to have minimal to no effect. Our findings serve as a cautionary tale that the practice of assigning personas to LLMs—a trend on the rise—can surface their deep-rooted biases and have unforeseeable and detrimental side-effects.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Primary Area: societal considerations including fairness, safety, privacy
Submission Number: 8745
Loading