Keywords: AI audits, temporal monitoring, political bias, transparency
TL;DR: We execute a longitudinal study on LLMs responses to political questions to track how their opinions and tone change over time
Abstract: Large Language Models (LLMs) like ChatGPT, Gemini, and Claude, are increasingly used as sources of information across a variety of topics. These include not only uncontested information (e.g., GDP of a country) but also information of political nature where multiple views might exist (e.g., the effect of tariffs on the economy). Therefore, as people increasingly rely on LLMs as sources of information on political topics, it is imperative to investigate whether there is a political drift in their responses over time. In this work, we present a longitudinal study of responses to politically relevant queries derived from real-world regulatory changes. We evaluate frontier LLMs from three major providers (Anthropic, Google and OpenAI) over the course of 36 weeks. Our dataset spans 246 questions from 12 political topics. We track model outputs for these questions at weekly intervals. Our analysis reveals that, while LLMs generally stay neutral, their responses to political questions demonstrate measurable temporal drift along the left- right political spectrum, with an increasing rightward shift. The magnitude of these shifts, while small overall, is more pronounced for certain topics and models, and often coincides with new model releases. We also observe that over time models show less certainty with increased hedging. Our findings highlight the need for continuous auditing and more transparency in model updates.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 120
Loading