Abstract: The rapid advancements in Large Language Models (LLMs) have led to a diverse landscape of models with varying capabilities and associated costs. No single LLM is optimal for all tasks, necessitating intelligent routing systems that can dynamically select the most appropriate model for a given input to balance performance and operational expense. In this study, we propose a novel benchmark-driven LLM routing framework designed to achieve a practical balance between task-specific performance and cost. As opposed to previous studies' frameworks, such as HybridLLM, RouteLLM, and LLMProxy, which often focus on binary classifiers for query assessment, our multi-stage system employs explicit task profiling using a lightweight classifier LLM to determine not only the query's category but also a more granular, multi-level difficulty. A key differentiating aspect is our tiered cost-performance model selection strategy, which systematically buckets models into cost percentiles and then selects the best-performing model within the appropriate tier for the predicted task profile, offering a more structured approach to balancing cost and performance. We evaluate the framework using three routing configurations. The Optimum router consistently achieves performance comparable to or exceeding the best individual models on specific tasks, but at significantly lower total costs.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: LLM Router, LLM Efficiency, Cost optimization, NLP in resource-constrained settings;
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English
Submission Number: 4857
Loading