Keywords: open-vocabulary object detection, prompts tuning, knowledge distillation
Abstract: Open Vocabulary Object Detection (OVD) aims to extend to novel classes solely through text descriptions, by learning the mapping between images and text from the base class. However, current methods focus on establishing connections between the visual regions of the target objects and their corresponding category names to learn prompts, ignoring richer contextual information and shared knowledge about these categories, which can easily lead to overfitting on known base categories and exhibit poor generalization to novel classes. To address the above problems, we propose Hierarchical prompts with Context-Aware calibration (HiCA) for open-vocabulary object detection, which integrates high-level semantic and contextual information into the detector from both linguistic and visual perspectives.
Hierarchical prompts effectively map regions with superior-level semantics, which encompasses shared knowledge of both base and novel classes, thereby enhancing the model's generalization ability to novel classes. Context-aware calibration utilizes the visual context of the image to establish the correlation between contextual information and categories, thereby minimizing the adverse effects of the background and enhancing generalization to novel classes. Extensive experiments demonstrate that the hierarchical prompts with context-aware calibration can effectively improve the performance of the open vocabulary detection methods. Especially on the OV-COCO, we achieve 57.2% base class mAP, surpassing the current state-of-the-art by 2.4% while achieving the best overall mAP.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1799
Loading