HiPOOD: Hierarchical Prompt-Aware Zero-Shot Out-of-Distribution Detection

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Out-of-Distribution Detection, Vision-Language Models, Semantic Label Hierarchy
Abstract: Reliable image recognition systems must not only classify known categories accurately but also detect instances of novel, unseen classes in open-set scenarios. Achieving this in a zero-shot setting—without any training examples—remains a significant challenge. In this paper, we propose a zero-shot out-of-distribution (OOD) detection approach that leverages semantic class hierarchies to enrich each known label with fine-grained subcategory sets, capturing subsumption relationships between classes. To generate these hierarchies, we query a large language model (LLM) with structured prompts, producing semantically coherent candidate subcategories that are subsequently filtered with a lexical ontology to ensure domain alignment. We incorporate the resulting label hierarchy into CLIP’s classification pipeline, a pre-trained vision–language model (VLM). This design enables the model to distinguish fine-grained categories within the known classes and to recognize when an input does not fit any known class—effectively identifying it as an unknown object. Notably, our approach operates in a zero-shot manner, requiring no additional training. Experiments on several standard OOD detection benchmarks show that our method achieves state-of-the-art performance. Furthermore, by organizing predictions within a semantic hierarchy, the model’s outputs become more informative and easier to interpret, including for inputs that it flags as unknown.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 18598
Loading