Abstract: With its superior maneuverability and flexibility, the unmanned aerial vehicle (UAV) is able to effortlessly tackle various complex scenarios and challenges. However, due to their small size, limited information, and severe occlusion, the subtle features and semantic information of small objects in UAV images are prone to being lost. This paper proposes partial attention fusion-based detection transformer (PAF-DETR) tailored specifically for enhancing small object detection in UAV images, which employs partial attention fusion module to capture potential small object regions, enhances feature connections through the integration of feature alignment module and CSP-Rep fusion module, and incorporates dynamic upsampling module. First, the extracted features from the backbone are input into attention-driven context-aware encoder and auxiliary branch. Second, within the encoder, attention-based internal scale interaction mechanism is specifically applied to the highest-level feature. Then, a bidirectional fusion strategy is adopted to fuse high-level semantic details with low-level features, while dynamically refining sampling points through dynamic upsampling module. Additionally, it propagates the detailed information from low-level feature to high-level feature. Third, feature alignment module in the auxiliary branch employs dynamic upsampling module to align the features obtained from the backbone’s last three stages, and CSP-Rep fusion module injects them into the corresponding features processed by the encoder. Finally, the decoder and head generate precise category probabilities, and bounding box coordinates as the final detection predictions. The PAF-DETR-50 excels on the VisDrone dataset, achieving a mAP50-95 of 31.3% and a mAP50 of 52%, showcasing its potency in small object detection. Available here is the code: https://github.com/lei12879/PAF-DETR.
External IDs:dblp:journals/tjs/LeiRWY25
Loading