Abstract: Retrieval-Augmented Generation (RAG) improves factuality but retrieving for every query often hurts quality while inflating tokens and latency. We propose Training-free Adaptive Retrieval Gating (\textbf{TARG}), a single-shot policy that decides when to retrieve using only a short, no-context draft from the base model. From the draft’s prefix logits, TARG computes lightweight uncertainty scores—mean token entropy, a margin signal derived from the top-1/top-2 logit gap via a monotone link, or small-$N$ variance across a handful of stochastic prefixes—and triggers retrieval only when the score exceeds a threshold. The gate is model-agnostic, adds only tens to hundreds of draft tokens, and requires no additional training or auxiliary heads. On NQ-Open, TriviaQA, and PopQA, TARG consistently pushes the accuracy–efficiency frontier: compared with Always-RAG\footnote{\textsc{Always-RAG}: retrieve for every query; \textsc{Never-RAG}: never retrieve.}, TARG matches or improves EM/F1 while reducing retrieval by 70–90\% and cutting end-to-end latency, and it remains close to Never-RAG in overhead. A central empirical finding is that under modern instruction-tuned LLMs the margin signal is a robust default (entropy compresses as backbones sharpen), with small-$N$ variance offering a conservative, budget-first alternative. We provide ablations over gate type and prefix length and use a $\Delta$-latency view to make budget trade-offs explicit.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - De-anonymized authors
- Added **Acknowledgments** section:
```latex
\section{Acknowledgments}
The authors thank the anonymous reviewers for their constructive feedback, which helped improve the paper substantially. We also thank colleagues and collaborators for helpful discussions and feedback during the development of this work.
```
- Added **Conflicts of interest** section:
```latex
\section{Conflicts of interest}
The authors declare that there are no other interests or personal relationships that could have appeared to influence the work reported in this paper.
```
Assigned Action Editor: ~Simone_Scardapane1
Submission Number: 7181
Loading