Navigating the Ocean of Biases: Political Bias Attribution in Language Models via Causal StructuresDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: We used activity dependency networks to explore the causal structure behind LLM biases and decision making.
Abstract: The rapid advancement of large language models (LLMs) like ChatGPT has sparked intense debate regarding their ability to perceive and interpret complex socio-political landscapes and many other complex tasks, often of a subjective nature. It is clear that LLMs show political bias, but currently the bias is reduced to a single number, leaving us with limited understanding of the actual internal causes. As a response to this, we use US presidential debates as an illustrative case to explore bias and its attribution in large language models (LLMs). The goal here is to investigate what attributes are assigned to the individual candidates and how these attributes interact with each other in a causal manner to form judgements. One of these attributes is the $\textit{Score}$, which reflects the LLM´s perception of the candidate's ability to argue and his chance of winning the election. We then use these attributes to discuss problems with oversimplified mitigation strategies based on naive bias estimations. To achieve this, values between 0-1 were assigned to each attribute for each speaker by prompting the LLM with a set of well chosen questions and subsections of the debates. Based on the partial correlations of these values, activity dependency networks (ADNs) are used to create a causal network estimation. The sensitivities expressed by the resulting graph are very conclusive, as they provide insight into the internal decision process of the LLM at an interpretable level of value associations, thus indicating how LLMs perceive the world and directly hinting at possible sources of bias. Our findings provide insights into how LLMs form judgements and perceive the world, thus allowing us to analyze potential sources of bias. For example, in our scenario, whether the $\textit{Speaker's Party}$ has a direct influence on the perceived $\textit{Score}$. We show how LLM biases can be understood and explained, at least partially, by analyzing value associations. Based on this, we reason that current perceptions of political bias in LLMs might be overestimated. We warn that resulting bias mitigation strategies based on limited information can be ineffective or even harmful by leading to unforeseen and undesired side effects, not accounting for the complex interactions between attributes and the wide range of diverse tasks the same models are used for. We emphasize the need for accurate attribution as a precursor to effective mitigation and AI-human alignment.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview