Confident or Seek Stronger: Exploring Uncertainty-Based Small LM Routing

Published: 24 Sept 2025, Last Modified: 24 Sept 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Model, LLM routing, Uncertainty Quantification
Abstract: Small language models (SLMs) are increasingly deployed on edge devices for personalized applications, offering low-latency inference and reduced energy consumption. However, they often struggle with complex queries, leading to unreliable responses. Uncertainty-based SLM routing addresses this by offloading low-confidence queries to stronger large language models (LLMs), following the principle “if uncertain, seek stronger support” to improve reliability. While leveraging LLMs enhances accuracy, it also incurs high invocation costs, making it crucial to balance efficiency and efficacy. In this paper, we conduct a comprehensive investigation into benchmarking of uncertainty-driven routing strategies from SLMs to LLMs over 5000+ settings. Our findings highlight: \textit{First}, uncertainty-correctness alignment in different uncertainty quantification (UQ) methods significantly impacts routing performance. The extracted uncertainty distribution is primarily influenced by the selected SLM and UQ method, showing minimal dependence on the downstream dataset.
Submission Number: 184
Loading