Keywords: LLM, Agent, Routing, Efficiency
Abstract: LLM agents achieve strong performance on complex reasoning tasks but incur high latency and compute cost. In practice, many queries fall within the capability boundary of cutting-edge LLMs and do not require full agent execution, making effective routing between LLMs and agents a key challenge.
We study the problem of routing queries between lightweight LLM inference and full agent execution under realistic cold-start settings.
To address this, we propose BoundaryRouter, a training-free routing framework that uses early behavioral experience and rubric-guided reasoning to decide whether to answer a query with direct LLM inference or escalate to an agent. BoundaryRouter builds a compact experience memory by executing both systems on a shared seed set and retrieves similar cases at inference time to guide routing decisions.
To evaluate this method, we introduce RouteBench, a benchmark covering in-domain, paraphrased, and out-of-domain route settings. Experiments show that BoundaryRouter reduces inference time by 60.6\% compared to the agent while improving performance by 28.6\% over direct LLM inference, outperforming prompt-based and retrieval-only routing by an average of 37.9\% and 8.2\%, respectively.
Submission Number: 171
Loading