Benchmarking Open-Set Recognition Beyond Vision-Language Pre-training

Benchmarking Open-Set Recognition Beyond Vision-Language Pre-training

ICLR 2026 Conference Submission14052 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: open-set recognition, vision-language models

Abstract: Vision-language models (VLMs) with open-vocabulary pre-training can still fail in classification tasks, especially when the granularity of downstream labels misaligns with the supervision during pre-training. In such cases, a few-shot training set is necessary to define the classification task on demand. Motivated by this, we investigate the performance of VLMs in open-set recognition (OSR), where an VLM is fine-tuned on a few-shot training set to recognize closed-set classes while identifying and rejecting samples from open-set ones, i.e., without compromising its open-vocabulary capabilities.We design a comprehensive benchmark to study this problem, varying along four key axes: (1) label granularity (fine- vs. coarse-grained classes), (2) the semantic distance of open-set classes from closed-set ones (OSR hardness), constructed using hierarchical taxonomies, (3) the number of training samples per class, and (4) fine-tuning objectives (discriminative vs. likelihood-based). Through systematic evaluation of CLIP-based and diffusion-based VLMs, we find that discriminative approaches are often misaligned with standard OSR hardness metrics, leading to unreliable rejection behavior. In contrast, the likelihood-based paradigm becomes tractable in the context of VLMs, and we propose a simple method based on a likelihood-ratio test that achieves strong OSR performance when given sufficient examples to model class-conditional likelihoods.Overall, our results demonstrate that OSR remains a relevant and underexplored challenge even in the era of VLMs. We provide actionable insights and a new benchmark to support future research toward more robust open-world recognition.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 14052

Loading