Keywords: Large Language Models (LLMs), Political Bias, Credibility Assessment, News Outlets
TL;DR: This study audits nine LLMs and finds that while they rate Bangladeshi news sources with internal consistency, they exhibit moderate alignment with human judgments and reveal political bias favoring pro-government outlets.
Abstract: Large language models (LLMs) are widely
used in search engines to provide direct an-
swers, while AI chatbots retrieve updated infor-
mation from the web. As these systems influ-
ence how billions access information, evaluat-
ing the credibility of news outlets has become
crucial. We audit nine LLMs from OpenAI,
Google, and Meta to assess their ability to eval-
uate the credibility and political bias of the top
20 most popular news outlets in Bangladesh.
While most LLMs rate the tested outlets, larger
models often refuse to rate sources due to in-
sufficient information, while smaller models
are more prone to hallucinations. We create a
dataset of credibility ratings and political iden-
tities based on journalism experts’ opinions and
compare these with LLM responses. We find
strong internal consistency in LLM credibil-
ity ratings, with an average correlation coeffi-
cient (ρ) of 0.72, but moderate alignment with
expert evaluations, with an average ρ of 0.45.
Most LLMs (GPT-4, GPT-4o-mini, Llama 3.3,
Llama-3.1-70B, Llama 3.1 8B, and Gemini 1.5
Pro) in their default configurations favor the
left-leaning Bangladesh Awami League, giving
higher credibility ratings, and show misalign-
ment with human experts. These findings high-
light the significant role of LLMs in shaping
news and political information
Archival Status: Archival
Acl Copyright Transfer: pdf
Paper Length: Long Paper (up to 8 pages of content)
Submission Number: 145
Loading