Abstract: We introduce DUCE, a generalizeable zero-shot object detector, and BCC, a novel method
of bounding box consolidation for models where traditional non-maximum suppression is
insufficient. DUCE leverages the zero-shot performance of CLIP (Radford et al. (2021))
in combination with a region proposal network (Ren et al. (2015)) to achieve state of the
art results in generalized zero-shot object detection with minimal training. This approach
introduces a new challenge in that DUCE is able to label portions of an image with very
high confidence, leading to numerous high confidence bounding boxes around an object of
interest. In these scenarios, traditional forms of non-maximum suppression fail to reduce
the number of bounding boxes. We introduce BCC as a new approach to bounding box
suppression, that allows us to successfully navigate this challenge. DUCE and BCC are
able to achieve competitive results to other state of the art models for all classes, agnostic of
whether or not the RPN was trained on those classes. Our proposed model and new method
bounding-box consolidation represents a novel approach to the zero-shot object detection
task.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Marcus_Rohrbach1
Submission Number: 107
Loading