Keywords: Large Language Models, Retrieval-Augmented Generation, Literature Retrieval
Abstract: Retrieval-Augmented Generation (RAG) improves factuality for scientific question answering, but scientific queries vary systematically in evidence scope. A single static retrieval pipeline faces a granularity–precision tension: strategies that maximize semantic coverage for global synthesis often miss needle-like evidence required for local extraction. We propose AdaLoc, a scope-adaptive evidence localization module. AdaLoc (i) routes queries by predicted scope, (ii) selects scope-specific evidence granularity, and (iii) applies a hierarchical filter cascade to balance semantic and lexical matching. We validate AdaLoc on the public QASPER benchmark, where it achieves competitive performance compared to closed-source baselines while using over 50% fewer tokens and purely open-source components. Furthermore, to rigorously probe the granularity trade-off, we introduce SCISCOPE, a diagnostic dataset of 2000 queries explicitly annotated with evidence scope. On SCISCOPE, AdaLoc substantially outperforms strong RAG baselines and long-context methods, particularly on local queries (improving F1 by over 3.7× compared to full-context baselines, 74.37% vs. 20.17%). These results demonstrate that precise evidence localization is more critical than simply extending context volume.
Paper Type: Long
Research Area: Information Extraction and Retrieval
Research Area Keywords: Information Retrieval and Text Mining,Question Answering,
Contribution Types: Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 4909
Loading