Keywords: Conformal Risk Control, Model Routing, Large Language Models, Cost–Accuracy Trade-off
Abstract: Recent advances in small-scale large language models have shown that compact models can successfully handle an expanding range of natural language and reasoning tasks. This progress opens the door to more affordable AI inference services by enabling broader use of cost-efficient models. However, existing approaches often fail to fully exploit small models due to fuzzy boundaries of their capabilities. In this paper, we propose a risk-controlled routing framework that dynamically selects among models of different scales, with a strong emphasis on maximizing the utility of smaller models. Our framework integrates supervised contrastive learning to enhance the separability of smaller-model capabilities and grounds its routing mechanism in conformal risk control, providing theoretical guarantees on system-level routing risk. Across extensive experiments, our method consistently outperforms state-of-the-art baselines, achieving an absolute accuracy gain of $\sim3.49\%$ at equal cost and up to $\sim36\%$ cost reduction at comparable accuracy.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 19757
Loading