Keywords: out-of-distribution, glaucoma, deferral
Abstract: Artificial Intelligence (AI) holds the potential to dramatically
improve patient care. However, it is not infallible, necessitating
human-AI-collaboration to ensure safe implementation. One aspect of
AI safety is the models’ ability to defer decisions to a human expert
when they are likely to misclassify autonomously. Recent research has
focused on methods that learn to defer by optimising a surrogate loss
function that finds the optimal trade-off between predicting a class label
or deferring. However, during clinical translation, models often face
challenges such as data shift. Uncertainty quantification methods aim to
estimate a model’s confidence in its predictions. However, they may also
be used as a deferral strategy which does not rely on learning from specific
training distribution. We hypothesise that models developed to quantify
uncertainty are more robust to out-of-distribution (OOD) input than
learned deferral models that have been trained in a supervised fashion.
To investigate this hypothesis, we constructed an extensive evaluation
study on a large ophthalmology dataset, examining both learned deferral
models and established uncertainty quantification methods, assessing
their performance in- and out-of-distribution. Specifically, we evaluate
their ability to accurately classify glaucoma from fundus images while
deferring cases with a high likelihood of error. We find that uncertainty
quantification methods may be a promising choice for AI deferral1.
Submission Number: 4
Loading