Track: Main paper track (up to 5 pages excluding references and appendix)
Keywords: large language models, routing, cascading, cost-quality tradeoff
TL;DR: We combine cascading and routing into a more powerful model selection method called cascade routing.
Abstract: The availability of a wide range of large language models embedded in various agentic systems has significantly increased the potential of model selection strategies to improve the cost-performance tradeoff. Existing strategies involve either routing, where a single model is chosen per query, or cascading, which sequentially runs increasingly larger models until a satisfactory answer is found. However, current approaches face three key limitations: they (1) lack formal proofs of optimality, (2) fail to identify the conditions under which these strategies are most effective, and (3) are unable to combine both paradigms. To address this, we propose *cascade routing*, a unified framework that integrates routing and cascading into a theoretically optimal strategy. Further, we identify good quality estimators as the critical factor for the success of model selection. Finally, we show that cascade routing consistently outperforms the baselines by a large margin.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 48
Loading