Abstract: Retrieval-augmented generation (RAG) and long-context language models (LCLMs) both address context limitations of LLMs in open-domain QA. However, optimal external context to retrieve remains an open problem: fixed retrieval budgets risk wasting tokens or omitting key evidence. Existing adaptive methods like Self-RAG and Self-Route rely on iterative LLM prompting and perform well on factoid QA, but struggle with aggregation QA where optimal context size is unknown and variable.
We present Adaptive‑k retrieval, a simple and effective single-pass method that selects a query-specific number of passages by applying a threshold to the similarity scores between the query and candidate passages. It does not require model fine-tuning, extra LLM calls or changes to existing retriever–reader pipelines. On both factoid and aggregation QA benchmarks, Adaptive‑k matches or outperforms fixed‑k baselines while using up to 10x fewer tokens than full-context input, and still retrieves 70\% of relevant passages. It improves accuracy across five LCLMs and two embedding models, highlighting that dynamically adjusting context size leads to more efficient and accurate QA.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: retrieval-augmented generation, LLM efficiency, dense retrieval
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 5263
Loading