Exploring Open-Vocabulary Models for Category-Free Detection

Pablo Garcia-Fernandez, Daniel Cores, Manuel Mucientes

Published: 2025, Last Modified: 28 Feb 2026CAIP (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Object detection models typically rely on a predefined set of categories, limiting their applicability in real-world scenarios where object classes may be unknown. In this paper, we propose a novel, training-free framework that enables off-the-shelf open-vocabulary object detectors (OvOD) to perform category-free detection—localizing and classifying objects without any prior category knowledge. Our approach leverages image captioning to dynamically generate descriptive terms directly from the image content, followed by a WordNet-based filtering process to extract semantically meaningful category names. These discovered categories are then embedded and matched with visual region features using a frozen OvOD model to perform detection. We evaluate our method on the COCO dataset in a fully zero-shot setting and demonstrate that it significantly outperforms strong multimodal large language model baselines, achieving an improvement of over 30 AP points. This highlights our method as a promising direction for more adaptive solutions to real-world detection challenges.
Loading