Zero-shot Object Detection with a Text and Image Contrastive Model

Zero-shot Object Detection with a Text and Image Contrastive Model

TMLR Paper107 Authors

19 May 2022 (modified: 17 Sept 2024)Withdrawn by AuthorsEveryoneRevisionsBibTeXCC BY 4.0

Abstract: We introduce DUCE, a generalizeable zero-shot object detector, and BCC, a novel method of bounding box consolidation for models where traditional non-maximum suppression is insufficient. DUCE leverages the zero-shot performance of CLIP (Radford et al. (2021)) in combination with a region proposal network (Ren et al. (2015)) to achieve state of the art results in generalized zero-shot object detection with minimal training. This approach introduces a new challenge in that DUCE is able to label portions of an image with very high confidence, leading to numerous high confidence bounding boxes around an object of interest. In these scenarios, traditional forms of non-maximum suppression fail to reduce the number of bounding boxes. We introduce BCC as a new approach to bounding box suppression, that allows us to successfully navigate this challenge. DUCE and BCC are able to achieve competitive results to other state of the art models for all classes, agnostic of whether or not the RPN was trained on those classes. Our proposed model and new method bounding-box consolidation represents a novel approach to the zero-shot object detection task.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Marcus_Rohrbach1

Submission Number: 107

Loading