Zero-Shot Object Detection With Transformers

Ye Zheng, Li Cui

2021 (modified: 16 Nov 2022)ICIP 2021Readers: Everyone

Abstract: Deep learning has significantly improved the precision of object detection with abundant labeled data. However, collecting and labeling sufficient data is extremely hard. Zero-shot object detection (ZSD) has been proposed to solve this problem which aims to simultaneously recognize and localize both seen and unseen objects. Recently, the transformer and its variant architectures have shown their effectiveness over conventional methods in many natural language processing and computer vision tasks. In this paper, we study the ZSD task and develop a new framework named zero-shot object detection with transformers (ZSDTR). ZSDTR consists of the head network, transformer encoder, transformer decoder, and the vision-semantic-attention trail network. We find that the transformer is very effective for improving the ability to recall unseen objects and the tail performs well for discriminating seen and unseen objects. To the best of our knowledge, our ZSDTR is the first method to use the transformer in ZSD task. Extensive experimental results on various zero-shot object detection benchmarks show that our ZSDTR outperforms the current state-of-the-art methods.

0 Replies