How You Ask Matters! Adaptive RAG Robustness to Query Variations

How You Ask Matters! Adaptive RAG Robustness to Query Variations

ACL ARR 2026 January Submission5730 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation, Adaptive Retrieval-Augmented Generation, Query Robustness, LLM, Benchmark

Abstract: Adaptive Retrieval-Augmented Generation (RAG) promises accuracy and efficiency by dynamically triggering retrieval only when needed and is widely used in practice. However, real-world queries vary in surface form even with the same intent, and their impact on Adaptive RAG remains under-explored. We introduce the first large-scale benchmark of diverse yet semantically identical query variations, combining human-written and model-generated rewrites. Our benchmark enables systematic evaluation of Adaptive RAG robustness across answer, computational cost, and retrieval decisions. We discover a critical robustness gap, where small surface-level changes in queries dramatically alter retrieval behavior and accuracy. Although larger models show better performance, robustness does not improve accordingly. These findings reveal that Adaptive RAG methods are highly vulnerable to query variations that preserve identical semantics, exposing a critical robustness challenge.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: open-domain QA, retrieval-augmented generation, benchmarking, automatic evaluation of datasets

Contribution Types: Model analysis & interpretability, Data resources, Data analysis

Languages Studied: English

Submission Number: 5730

Loading