Abstract: Few-Shot Object Detection (FSOD) aims to make the detector adapt to unseen classes with only a few training samples. Typical FSOD methods use Faster R-CNN as the basic detection framework, which utilizes convolutional neural networks to extract image features. However, pooling operations adopted in convolutional neural networks aiming to capture as much image information as possible will inevitably lead to the loss of image information. Therefore, we introduce Hybrid Dilated Convolution (HDC) in the backbone network to ensure a larger receptive field and minimize the loss of image information. Besides, in the k-shot ( $$k\ge 2$$ ) setting, we notice that previous methods use an average aggregation operation to fuse the support features of each category. This operation treats every support sample equally, obviously under-utilizing the limited samples. In order to further take advantage of the given support data, we propose a Support Features Dynamic Fusion (SFDF) module. Our proposed SFDF module takes the correlation between each support feature and the query feature as the weight to fuse the support features adaptively for more robust support clues. Experiments demonstrate that our proposed method achieves state-of-the-art FSOD performance on public Pascal VOC and MS-COCO datasets.
0 Replies
Loading