A Variational Information Theoretic Approach to Out-of-Distribution Detection

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: A theory and new explainable framework for Out-of-Distribution Detection.
Abstract: We present a theory for the construction of out-of-distribution (OOD) detection features for neural networks. We introduce random features for OOD through a novel information-theoretic loss functional consisting of two terms, the first based on the KL divergence separates resulting in-distribution (ID) and OOD feature distributions and the second term is the Information Bottleneck, which favors compressed features that retain the OOD information. We formulate a variational procedure to optimize the loss and obtain OOD features. Based on assumptions on OOD distributions, one can recover properties of existing OOD features, i.e., shaping functions. Furthermore, we show that our theory can predict a new shaping function that out-performs existing ones on OOD benchmarks. Our theory provides a general framework for constructing a variety of new features with clear explainability.
Lay Summary: We developed a framework and theory to develop out-of-distribution (OOD) detection methods and rigorously understand existing approaches. OOD detection is the problem of identifying data out of the statistical distribution of the data that the neural network (NN) was trained on. The NN may erroneously give confident predictions on such OOD. Our approach formulates OOD features as a function of NN features, through a novel loss function, which favors extracting OOD relevant information from the NN feature. We showed that our approach offers an explanation on why and under what conditions existing techniques work. We also showed how to use our approach to construct new OOD features, which is shown to provably generalize better than competing approaches. Our extensive benchmarking empirically validated this fact.
Primary Area: General Machine Learning
Keywords: out-of-distribution detection, information theory, variational calculus
Submission Number: 6930
Loading