Guidelines for the selection, calibration, and evaluation of post-hoc OOD detectors in high-dimensional representations generated by CNN and ViT models in image recognition

Szymon Datko, Kamil Szyc, Tomasz Walkowiak, Henryk Maciejewski

Published: 24 Jul 2026, Last Modified: 20 May 2026OpenReview Archive Direct UploadEveryoneRevisionsCC BY-NC 4.0

Abstract: Detecting out-of-distribution (OOD) examples is essential for using machine learning in safety-critical applications. Although many OOD methods have been proposed, the literature does not provide conclusive recommendations on which OOD detection method should be used in practice, as the performance of OOD detectors is benchmark-specific. We comprehensively analyzed the performance of parametric and nonparametric post-hoc OOD detectors in representation spaces generated by several lines of CNNs and ViT architectures. On this basis, we pointed out each deep model’s preferred post-hoc OOD detector(s), which consistently outperformed other detectors regardless of the OOD benchmark dataset. We analyzed the properties of the high-dimensional representations generated by different deep networks. We challenge the assumptions underlying the prominent Mahalanobis distance-based OOD detector with the pooled covariance matrix. We analyzed the impact of the simplifying assumptions (such as common vs. diagonal covariance matrices). We showed that the performance of the Mahalanobis method improves when more realistic simplifications are made, given the actual characteristics of the data. We also quantified the impact on OOD detection of the unreliable probability density estimation in high-dimensional data - as the Mahalanobis method relies on the MVN model fitted to sparse data. We show how this curse of dimensionality phenomenon affects the choice of the OOD detection threshold. Finally, we propose the extended metrics for evaluating the OOD detectors. Using these metrics, we show that many CNN and ViT models include classes with surprisingly poor OOD generalization despite high overall AUROC in OOD detection benchmarks.