An Information-theoretical Framework for Understanding Out-of-distribution Detection with Pretrained Vision-Language Models

Bo Peng; Jie Lu; Guangquan Zhang; Zhen Fang

An Information-theoretical Framework for Understanding Out-of-distribution Detection with Pretrained Vision-Language Models

Bo Peng, Jie Lu, Guangquan Zhang, Zhen Fang

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Out-of-distribution Detection, Pretrained Vision-Language Models, Point-wise Mutual Information

Abstract: Out-of-distribution (OOD) detection, recognized for its ability to identify samples of unknown classes, provides solid advantages in ensuring the reliability of machine learning models. Among existing OOD detection methods, pre-trained vision-language models have emerged as powerful post-hoc OOD detectors by leveraging textual and visual information. Despite the empirical success, there still remains a lack of research on a formal understanding of their effectiveness. This paper bridges the gap by theoretically demonstrating that existing CLIP-based post-hoc methods effectively perform a stochastic estimation of the point-wise mutual information (PMI) between the input image and each in-distribution label. This estimation is then utilized to construct energy functions for modeling in-distribution distributions. Different from prior methods that inherently consider PMI estimation as a whole task, we, motivated by the divide-and-conquer philosophy, decompose PMI estimation into multiple easier sub-tasks by applying the chain rule of PMI, which not only reduces the estimation complexity but also provably increases the estimation upper bound to reduce the underestimation bias. Extensive evaluations across mainstream benchmarks empirically manifest that our method establishes a new state-of-the-art in a variety of OOD detection setups.

Primary Area: Other (please use sparingly, only use the keyword field for more details)

Submission Number: 9577

Loading