Towards Fair And Comprehensive Evaluation Of Routers In Collaborative LLM Systems

ACL ARR 2026 January Submission161 Authors

22 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large language models, Efficient ML, Query Routing
Abstract: Large language models (LLMs) have achieved success, but cost and privacy constraints necessitate deploying smaller models locally while offloading complex queries to cloud-based models. Existing router evaluations are unsystematic, overlooking scenario-specific requirements and out-of-distribution robustness. We propose a principled evaluation framework with three dimensions: router ability, scenario alignment, and cross-domain robustness. Unlike prior work that relies on output probabilities or external embeddings, we utilize internal hidden states that capture model uncertainty before answer generation. We introduce ProbeDirichlet, a lightweight router that aggregates cross-layer hidden states via input-dependent Dirichlet distributions. Trained on multi-domain data, it generalizes robustly across in-domain and out-of-distribution scenarios. Our results show ProbeDirichlet outperforms the best baselines by 16.68% in router ability and 18.86% in high-accuracy scenarios, with strong generalization across heterogeneous tasks and agentic workflows.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Large language models,Efficient ML,Query Routing
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 161
Loading