Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer

Anqi Mao; Mehryar Mohri; Yutao Zhong

Mastering Multiple-Expert Routing: Realizable $H$-Consistency and Strong Guarantees for Learning to Defer

Anqi Mao, Mehryar Mohri, Yutao Zhong

Published: 01 May 2025, Last Modified: 15 Aug 2025ICML 2025 posterEveryoneRevisionsBibTeXCC0 1.0

Abstract: The problem of learning to defer with multiple experts consists of optimally assigning input instances to experts, balancing the trade-off between their accuracy and computational cost. This is a critical challenge in natural language generation, but also in other fields such as image processing, and medical diagnostics. Recent studies have proposed surrogate loss functions to optimize deferral, but challenges remain in ensuring their consistency properties. This paper introduces novel surrogate loss functions and efficient algorithms with strong theoretical learning guarantees. We address open questions regarding realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for both single-stage (jointly learning predictor and deferral function) and two-stage (learning only the deferral function with a fixed expert) learning scenarios. For single-stage deferral, we introduce a family of new realizable $H$-consistent surrogate losses and further prove $H$-consistency for a selected member. For two-stage deferral, we derive new surrogate losses that achieve realizable $H$-consistency, $H$-consistency bounds, and Bayes-consistency for the two-expert scenario and, under natural assumptions, multiple-expert scenario. Additionally, we provide enhanced theoretical guarantees under low-noise assumptions for both scenarios. Finally, we report the results of experiments using our proposed surrogate losses, comparing their performance against existing baselines.

Lay Summary: Imagine a system using a learning algorithm that, like a person, knows when to tackle a problem itself and when to pass it on to a specialist. For example, a customer service chatbot might handle simple requests instantly but should hand off complex or sensitive issues to a human agent. Similarly, in medical imaging, a fast learning algorithm could screen for common conditions but defer ambiguous cases to a more powerful, but slower and more expensive, algorithm or a human radiologist. This "learning to defer" is crucial for creating efficient and reliable systems that balance speed and accuracy. The challenge is teaching a learning algorithm how to make this deferral decision optimally. If it defers too often, it loses the benefit of its speed; if it rarely defers, it might make critical mistakes. Previous methods for training this skill have had a key weakness: it was hard to guarantee that the learning algorithm was actually learning the best deferral strategy. The training process might reward the algorithm for behaviors that seem good during training but don't hold up in real-world situations. Our research solves this problem by developing a new and more principled way to train learning algorithms to defer. We have created new "scoring rules" for the learning algorithm during its training that are provably linked to good real-world performance. These rules ensure that when the algorithm gets a better score in training, it will also make better deferral decisions in practice. We have rigorously proven that our method works under a variety of conditions, both when the learning algorithm learns to solve and defer tasks simultaneously and when it only learns how to defer to a pre-existing expert. Experiments show that our approach is more effective than previous techniques, leading to intelligent systems that can more reliably decide when to ask for help.

Primary Area: Theory->Learning Theory

Keywords: learning to defer, consistency, realizable H-consistency, learning theory

Submission Number: 7316

Loading