Quantifying the Robustness of Retrieval-Augmented Language Models Against Spurious Features in Grounding Data

Quantifying the Robustness of Retrieval-Augmented Language Models Against Spurious Features in Grounding Data

ACL ARR 2025 May Submission3589 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Robustness has become a critical attribute for the deployment of RAG systems in real-world applications. Existing research focuses on robustness to explicit noise (e.g., document semantics) but overlooks implicit noise (spurious features). Moreover, previous studies on spurious features in LLMs are limited to specific types (e.g., formats) and narrow scenarios (e.g., ICL). In this work, we statistically demonstrate the presence of spurious features in the RAG paradigm, a robustness problem caused by the sensitivity of LLMs to semantic-agnostic features. Then, we propose a comprehensive taxonomy of spurious features and empirically quantify their impact through controlled experiments. Our analysis reveals that not all spurious features are harmful and they can even be beneficial sometimes. Further evaluation results suggest that spurious features are a widespread and challenging problem in the field of RAG. The code and dataset will be released to facilitate future research.

Paper Type: Long

Research Area: Generation

Research Area Keywords: retrieval-augmented generation; robustness; evaluation

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources

Languages Studied: English

Submission Number: 3589

Loading