Keywords: LLM Routing
Abstract: As large language models (LLMs) grow in scale and specialization, routing—selecting the best model for a given input—has become essential for efficient and effective deployment. While recent methods rely on increasingly complex learned routing strategies, their dependence on disparate training data and evaluation setups makes comparison and generalization difficult. In this work, we fundamentally rethink LLM routing by questioning whether such complexity is necessary. We show that a well-tuned k-Nearest Neighbors (kNN) approach not only matches but often outperforms state-of-the-art learned routers while being significantly more efficient. To support systematic evaluation, we introduce a suite of standardized routing benchmarks spanning instruction-following, question-answering, and reasoning tasks, as well as the first multi-modal routing dataset involving visual inputs. Our theoretical analysis reveals that the strong locality properties of model performance in embedding space enable simple non-parametric methods to achieve superior routing decisions with lower sample complexity than parametric approaches. These findings challenge the prevailing trend toward sophisticated architectures and demonstrate that simple, interpretable approaches can be surprisingly effective for LLM routing.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 21199
Loading