Rational Synthesizers or Heuristic Followers? Analyzing LLMs in RAG-based Question-Answering

Rational Synthesizers or Heuristic Followers? Analyzing LLMs in RAG-based Question-Answering

ACL ARR 2026 January Submission8485 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation, Large Language Models, Evidence Aggregation, Knowledge Conflicts, Illusory Truth Effect, Primacy Bias, RAG Systems, Heuristic Reasoning, Model Robustness, Question Answering

Abstract: Retrieval-Augmented Generation (RAG) is the prevailing paradigm for grounding Large Language Models (LLMs), yet the mechanisms governing $\textit{how}$ models integrate groups of conflicting retrieved evidence remain opaque. Does an LLM answer a certain way because the evidence is factually strong, because of a prior belief, or merely because it is repeated frequently? To answer this, we introduce $\textbf{GroupQA}$, a curated dataset of 1,635 controversial questions paired with 15,058 diversely-sourced evidence documents, annotated for stance and qualitative strength. Through controlled experiments, we characterize group-level evidence aggregation dynamics: Paraphrasing an argument can be more persuasive than providing distinct independent support; Models favor evidence presented first rather than last, and Larger models are increasingly resistant to adapt to presented evidence. Additionally, we find that LLM explanations to group-based answers are unfaithful. Together, we show that LLMs behave consistently as vulnerable heuristic followers, with direct implications for improving RAG system design.

Paper Type: Long

Research Area: Retrieval-Augmented Language Models

Research Area Keywords: Question Answering, Interpretability and Analysis of Models for NLP, Retrieval-Augmented Language Models

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 8485

Loading