Abstract: The challenge of Out-Of-Distribution (OOD) robustness remains
a critical hurdle towards deploying deep vision models. Vision-
Language Models (VLMs) have recently achieved groundbreaking results.
VLM-based open-vocabulary object detection extends the capabilities
of traditional object detection frameworks, enabling the recognition and
classification of objects beyond predefined categories. Investigating OOD
robustness in recent open-vocabulary object detection is essential to increase
the trustworthiness of these models. This study presents a comprehensive
robustness evaluation of the zero-shot capabilities of three recent
open-vocabulary (OV) foundation object detection models: OWL-ViT,
YOLO World, and Grounding DINO. Experiments carried out on the robustness
benchmarks COCO-O, COCO-DC, and COCO-C encompassing
distribution shifts due to information loss, corruption, adversarial
attacks, and geometrical deformation, highlighting the challenges of the
model’s robustness to foster the research in this field. Project webpage:
https://prakashchhipa.github.io/projects/ovod_robustness
Loading