Keywords: out-of-distribution detection, transductive, predictive uncertainty, ensembles, ensemble diversity, outlier detection
Abstract: Machine learning models are often used in practice once they achieve good generalization results on in-distribution (ID) holdout data. To predict test sets in the wild, they should detect samples they cannot predict well. We show that current out-of-distribution (OOD) detection algorithms for neural networks produce unsatisfactory results in a variety of OOD detection scenarios, e.g.\ when OOD data consists of unseen classes or corrupted measurements. This paper studies how such ``hard'' OOD scenarios can benefit from tuning the detection method after observing a batch of the test data. This \emph{transductive} setting is relevant when the advantage of even a slightly delayed OOD detection outweighs the financial cost for additional tuning. We propose a novel method that uses an artificial labeling scheme for the test data and early stopping regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch. We show via comprehensive experiments that our approach is indeed able to significantly outperform both inductive and transductive baselines on difficult OOD detection scenarios, such as unseen classes on CIFAR-10/CIFAR-100, severe corruptions (CIFAR-C), and strong covariate shift ImageNet vs ObjectNet.
One-sentence Summary: Transductive out-of-distribution detection via ensembles of regularized models that agree on inliers and disagree on outliers.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2012.05825/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=IVwLl0lSfu
13 Replies
Loading