Abstract: As a part of computer vision, object detection is crucial for traffic management, emergency response, autonomous driving vehicles, and smart cities. Despite significant progress in object detection, detecting small objects remains challenging due to their low resolution, which results in little visual information, difficulty in extracting discriminative features, and susceptibility to interference from environmental factors. To address these challenges, we propose SF-DETR, a new model designed specifically for scenarios with small targets. Firstly, We designed a new backbone network that uses partial channel self-attention to replace the backbone network of RTDETR, which can capture both local and global contextual information for extracting and enhancing input features, thereby enhancing the perception of small targets. Secondly, in order to enhance the feature representation ability of the model and better preserve the details of small objects, we proposed Feature Enhancement and Refinement (FER) module, which incorporates a bidirectional fusion mechanism between high-resolution and low-resolution features, allowing for more comprehensive information transfer between features and further improving the effect of multi-scale feature fusion. Finally, we introduce an efficient IoU method (PIoU) which simplifies the computation, speeds up the convergence, and improves the detection accuracy. SF-DETR significantly improves the detection of small targets, outperforming widely used models on various metrics, while significantly reducing model parameters and computational costs compared to RT-DETR. Compared to RTDETR-R34, our model has improved the mAP@50 and mAP@0.5:0.95 by 3.3% and 2.4% respectively on the Visdrone2019 test set.
External IDs:dblp:conf/cscwd/GaoMZWLD25
Loading