Keywords: performance estimation, distribution shifts, model disagreement
TL;DR: We study disagreement notions based on the full predictive distribution to forecast performance under distribution shift
Abstract: Knowing if a model will generalize to data `in the wild' is crucial for safe deployment. To this end, we study model disagreement notions that consider the full predictive distribution - specifically disagreement based on Hellinger distance, Jensen-Shannon and Kullback–Leibler divergence. We find that divergence-based scores provide better test error estimates and detection rates on out-of-distribution data compared to their top-1 counterparts. Experiments involve standard vision and foundation models.
Submission Number: 52
Loading