Keywords: Large Language Models (LLMs), Political Bias, Credibility Assessment, News Outlets
TL;DR: This study audits nine LLMs and finds that while they rate Bangladeshi news sources with internal consistency, they exhibit moderate alignment with human judgments and reveal political bias favoring pro-government outlets.
Abstract: Search engines increasingly use large language models (LLMs) to generate direct answers, while AI chatbots retrieve updated information from the
Internet. As information curators for billions of users, LLMs must evaluate the accuracy and reliability of sources. This study audits nine LLMs
from OpenAI, Google, and Meta to assess their ability to evaluate the credibility and quality of the top 20 most popular Bangladeshi news outlets.
While LLMs rate most tested outlets, larger models more often refuse to rate sources due to insufficient information, while smaller models are
prone to hallucinations. When ratings are provided, LLMs show strong internal consistency with an average correlation coefficient (𝜌) of 0.72, but
their alignment with human expert evaluations is moderate, with an average 𝜌 of 0.45. We introduce a dataset of expert opinions (journalism and
media studies students) on the credibility and political bias of Bangladeshi news outlets to evaluate LLMs’ political bias and credibility assessments.
Our analysis reveals that LLMs in default configurations favor the Bangladesh Awami League-affiliated sources in credibility ratings. Assigning
partisan identities to LLMs further amplifies politically congruent biases in their assessments. These findings highlight the need to address political
bias and improve credibility evaluations as LLMs increasingly shape how news and political information are curated worldwide.
Archival Status: Archival
Paper Length: Long Paper (up to 8 pages of content)
Submission Number: 145
Loading