Abstract: Abstract. Current object detectors are limited in vocabulary size due to
the small scale of detection datasets. Image classiers, on the other hand,
reason about much larger vocabularies, as their datasets are larger and
easier to collect. We propose Detic, which simply trains the classiers of a
detector on image classication data and thus expands the vocabulary of
detectors to tens of thousands of concepts. Unlike prior work, Detic does
not need complex assignment schemes to assign image labels to boxes
based on model predictions, making it much easier to implement and
compatible with a range of detection architectures and backbones. Our
results show that Detic yields excellent detectors even for classes without
box annotations. It outperforms prior work on both open-vocabulary and
long-tail detection benchmarks. Detic provides a gain of 2.4 mAP for
all classes and 8.3 mAP for novel classes on the open-vocabulary LVIS
benchmark. On the standard LVIS benchmark, Detic obtains 41.7 mAP
when evaluated on all classes, or only rare classes, hence closing the gap in
performance for object categories with few samples. For the rst time, we
train a detector with all the twenty-one-thousand classes of the ImageNet
dataset and show that it generalizes to new datasets without netuning.
Code is available at https://github.com/facebookresearch/Detic.
0 Replies
Loading