Keywords: LLM Routing, Deep Learning
Abstract: Recently, test-time scaling of Large Language Models (LLMs) has emerged as a practical alternative to parameter and data scaling. Reasoning tasks often require large-scale, RLVR-based LLMs, while more economical LLMs can handle simpler tasks. Routing an LLM tailored to *suitability* (*i.e.*, capability and cost) ensures usability and efficiency. We introduce LLMRec, which routes the most suitable LLM to the user query without pre-inference on the candidate LLM zoo. It pioneeringly reframes the LLM routing problem as a comprehensive recommendation system (RecSys) task. Our core insight is that an LLM's suitability for a query is a complex, latent signal equal to user-item preference. LLMRec systematically engineers features for candidate LLMs (intrinsic attributes and capability distributions), queries (general semantics and meta-dimensional info), and context (inference type, cost budgets). It also incorporates behavioral features to learn high-order interactions. LLMRec is designed to generalize to out-of-domain datasets and adapt to new LLMs as the model zoo evolves. We define the metric with the Pareto frontier under user-specified cost budgets. Across six datasets, LLMRec achieves an average cost reduction of over 38% while maintaining accuracy and consistently outperforming baselines in converging toward the Pareto frontier.
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 24785
Loading