Abstract: Scale variation is one of the challenges in object detection. In this paper, we design a Multi-Level Feature Fusion Pyramid Network (MLFFPN) that can fuse features with different receptive fields so as to produce reliable object representations robust against scale variation. Specifically, we perform feature extraction on the backbone network with convolutional kernels of different sizes, reconstructing the feature pyramids with the various receptive fields by adding top-down paths and lateral connections. Then, the reconstructed feature pyramids are fused. Finally, the bottom-up path enhancement is added for the final prediction. To verify the proposed method, we constructed a large-scale object detection dataset containing in total 225,944 instances and 16,000 images of 30 classes of common objects. In this study, we introduce MLFFPN into the object detection network and conduct a series of experiments on our datasets and MSCOCO datasets. Without bells and whistles, MLFFPN achieves a considerable detection improvement over the baseline network.
External IDs:dblp:journals/vc/GuoSLZW23
Loading