Bias or Factual Recall? Understanding How LLMs Compare Entities.

Bias or Factual Recall? Understanding How LLMs Compare Entities.

ACL ARR 2025 May Submission5723 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We analyze the ability of LLMs to answer comparison questions (e.g., ``Which is longer, the Danube or the Nile?''). Our central observation is that LLMs often make mistakes when answering such questions, even when they have the required knowledge (e.g., the length of the rivers involved). We furthermore find that their predictions are heavily influenced by superficial biases, such as the position of the entities in the question, their relative popularity, and shallow co-occurrence statistics. These findings suggest that simple prompting-based strategies may not leverage the ranking abilities of LLMs to their full potential, and that LLMs continue to struggle with even simple reasoning tasks.

Paper Type: Short

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: data shortcuts/artifacts

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 5723

Loading