Keywords: Steering, Political Opinions, Bias in LLMs
TL;DR: The inclusion of irrelevant contexts in LLMs prompts significantly steers their political alignments.
Abstract: Several recent works have examined the generations produced by large language models (LLMs) on subjective topics such as political opinions and attitudinal questionnaires. There is growing interest in controlling these outputs to align with specific users or perspectives using model steering techniques. However, several studies have highlighted unintended and unexpected steering effects, where minor changes in the prompt or irrelevant contextual cues influence model-generated opinions.
This work empirically tests how irrelevant information can systematically bias model opinions in specific directions. Using the Political Compass Test questionnaire, we conduct a detailed statistical analysis to quantify these shifts using the opinions generated by LLMs in an open-generation setting. The results demonstrate that even seemingly unrelated contexts consistently alter model responses in predictable ways, further highlighting challenges in ensuring the robustness and reliability of LLMs when generating opinions on subjective topics.
Archival Status: Archival
Paper Length: Long Paper (up to 8 pages of content)
Submission Number: 89
Loading