Abstract: Learning to defer (L2D) aims to optimize human-AI collaboration by allocating prediction tasks to either a machine learning model or a human expert, depending on which is most likely to be correct. This allocation decision is governed by a rejector: a meta-model that routes inputs based on estimated success probabilities. In practice, a poorly fit or otherwise misspecified rejector can jeopardize the entire L2D workflow due to its crucial role in allocating prediction tasks. In this work, we perform uncertainty quantification for the rejector. We use conformal prediction to allow the rejector to output prediction sets or intervals instead of just the binary outcome of ‘defer’ or not. On tasks ranging from image to hate speech classification, we demonstrate that the uncertainty in the rejector translates to safer decisions via two forms of selective prediction.
Submission Type: Regular submission (no more than 12 pages of main content)
Code: https://github.com/yizirui/conformal_L2D
Assigned Action Editor: ~Manuel_Haussmann1
Submission Number: 6288
Loading