Learning Prediction Intervals for Model Performance

Anupama Murthi, Benjamin Elder, Matthew Arnold, Jiri Navratil

Published: 01 Feb 2025, Last Modified: 25 Apr 2025AAAI 2021EveryoneCC BY-SA 4.0

Abstract: Understanding model performance on unlabeled data is a fun- damental challenge of developing, deploying, and maintain- ing AI systems. Model performance is typically evaluated using test sets or periodic manual quality assessments, both of which require laborious manual data labeling. Automated performance prediction techniques aim to mitigate this bur- den, but potential inaccuracy and a lack of trust in their pre- dictions has prevented their widespread adoption. We address this core problem of performance prediction uncertainty with a method to compute prediction intervals for model perfor- mance. Our methodology uses transfer learning to train an un- certainty model to estimate the uncertainty of model perfor- mance predictions. We evaluate our approach across a wide range of drift conditions and show substantial improvement over competitive baselines. We believe this result makes pre- diction intervals, and performance prediction in general, sig- nificantly more practical for real-world use.