Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long Context Selection
Keywords: Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), Long-Context, Context Selection, Structured Reasoning
TL;DR: Pre-Route is a lightweight framework that proactively chooses between retrieval-augmented generation and long-context using metadata before retrieval, delivering interpretable routing, higher performance, and lower cost across multiple benchmarks.
Abstract: Recent advances in large language models (LLMs) have expanded the context window to beyond 128K tokens, enabling long-document understanding and multi-source reasoning. A key challenge, however, lies in choosing between **retrieval-augmented generation (RAG)** and **long-context (LC)** strategies: RAG is efficient but constrained by retrieval quality, while LC supports global reasoning at higher cost and with position sensitivity. Existing methods such as *Self-Route* adopt failure-driven fallback from RAG to LC, but remain passive, inefficient, and hard to interpret. We propose **Pre-Route**, a proactive routing framework that performs structured reasoning *before* answering. Using lightweight metadata (e.g., document type, length, initial snippet), Pre-Route enables task analysis, coverage estimation, and information-need prediction, producing explainable and cost-efficient routing decisions. Our study shows three key findings: (i) LLMs possess latent routing ability that can be reliably activated with guidelines, allowing single-sample performance to approach that of multi-sample (Best-of-N) results; (ii) linear probes reveal that structured prompts sharpen the separability of the "optimal routing dimension" in representation space; and (iii) distillation transfers this reasoning structure to smaller models for lightweight deployment. Experiments on LaRA (in-domain) and LongBench-v2 (OOD) confirm that Pre-Route outperforms Always-RAG, Always-LC, and Self-Route baselines, achieving superior overall cost-effectiveness.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 8454
Loading