Keywords: LLM selection, Model Routing
TL;DR: POLLINATOR efficiently routes queries to the right LLM by combining graph-based prediction with dual optimization, improving both accuracy and cost.
Abstract: The rapid growth of the intelligence marketplace has created an abundance of Large Language Model (LLM) producers, each with different cost–performance tradeoffs, making optimal selection challenging and resource-intensive. We present POLLINATOR, a novel router that integrates a frugal, data-efficient predictor with an online dual-based optimizer. The predictor combines graph-based semi-supervised learning with an Item Response Theory (IRT) head, reducing training cost by up to 49% while improving predictive accuracy over prior state-of-the-art. The optimizer formulates matchmaking as a strongly convex problem, which allows efficient dual-to-primal conversion for real-time serving. Extensive experiments demonstrate that POLLINATOR delivers superior cost–performance tradeoffs: achieving 0.43%-1.5% gains at 71%-93% of the cost of state-of-the-art router, 3-5% gains at only 1.9-3% of the cost of the best individual producer,
and up to 10.6% higher accuracy at just 0.3-35.7% of the cost on challenging real-world benchmarks such as BFCL-V3 and MMLU-Pro. Finally, the interpretability of learned query difficulties and model abilities demonstrates POLLINATOR’s effectiveness for dynamic and cost-efficient intelligence matchmaking.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 16887
Loading