Universal LLM Routing with Correctness-Based Representation

Wittawat Jitkrittum; Harikrishna Narasimhan; Ankit Singh Rawat; Jeevesh Juneja; Zifeng Wang; Chen-Yu Lee; Pradeep Shenoy; Rina Panigrahy; Aditya Krishna Menon; Sanjiv Kumar

Universal LLM Routing with Correctness-Based Representation

Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Jeevesh Juneja, Zifeng Wang, Chen-Yu Lee, Pradeep Shenoy, Rina Panigrahy, Aditya Krishna Menon, Sanjiv Kumar

Published: 05 Mar 2025, Last Modified: 14 Apr 2025SCOPE - ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Main paper track (up to 5 pages excluding references and appendix)

Keywords: routing, adaptive computation, learning to defer

TL;DR: We propose a principled model routing framework which allows new models to be added to or removed from the serving pool without having to retrain the routing model.

Abstract: Large language models’ significant advances in capabilities are accompanied by significant increases in inference costs. Model routing is a simple technique for reducing inference cost, wherein one maintains a pool of candidate LLMs, and learns to route each prompt to the smallest feasible LLM. Existing works focus on learning a router for a fixed pool of LLMs. In this paper, we consider the problem of dynamic routing, where new, previously unobserved LLMs are available at test time. We propose a new approach to this problem that relies on representing each LLM as a feature vector, derived based on predictions on a set of representative prompts. Based on this, we detail an effective strategy relying on cluster-based routing. We prove that the strategy is an estimate of a theoretically optimal routing rule. Experiments on a range of public benchmarks show the effectiveness of the proposal in routing amongst more than 30 unseen LLMs.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 73

Loading