Enhancing Zero-shot OOD Detection with Pre-trained Multimodal Foundation Models

27 Apr 2025 (modified: 29 Oct 2025)Submitted to NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: OOD detection
Abstract: Out-of-distribution (OOD) detection is essential for reliable deployment of deep models in real-world scenarios. Advances in pre-trained multimodal foundation models have enabled zero-shot OOD detection using only in-distribution (ID) labels. Recent methods of this direction expand the label space with auxiliary labels to facilitate the discrimination between IDs and OODs. Inspired by the probabilistic formulation via Binomial distribution, we further discover the key factors that theoretically affect zero-shot OOD detection performance: the cardinality of the auxiliary label set, the similarity between labels and samples, and the uncertainty of the similarity scores. From the theoretical analysis, existing methods that construct fixed, single-modality auxiliary labels surely have limited effectiveness. To address these issues, we propose Refer-OOD, a framework that adaptively generates, filters, and retrieves multimodal references that explicitly account for these factors. It consists of three modules: reference acquisition, feature mapping, and decision module. Experiments across multiple benchmarks demonstrate that Refer-OOD consistently improves zero-shot OOD detection with both vision-language models (VLMs) and multimodal large language models (MLLMs).
Supplementary Material: zip
Primary Area: Social and economic aspects of machine learning (e.g., fairness, interpretability, human-AI interaction, privacy, safety, strategic behavior)
Submission Number: 4295
Loading