DASSF: Dynamic-Attention Scale-Sequence Fusion for Aerial Object Detection

Published: 01 Jan 2025, Last Modified: 01 Aug 2025CVM (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The detection of small objects in aerial images is a fundamental task in the field of computer vision. Moving objects in aerial photography have problems such as different shapes and sizes, dense overlap, occlusion by the background, and object blur, however, the original YOLO method has low overall detection accuracy due to its weak ability to perceive targets of different scales. In order to improve the detection accuracy of densely overlapping small targets and fuzzy targets, this paper proposes a dynamic-attention scale-sequence fusion method (DASSF) for small target detection in aerial images. First, we propose a dynamic scale sequence feature fusion (DSSFF) module that improves the upsampling mechanism and reduces computational load. Secondly, a x-small object detection head is specially added to enhance the detection capability of small targets. Finally, in order to improve the expressive ability of targets of different types and sizes, we use the dynamic head (DyHead). The model we proposed solves the problem of small target detection in aerial images and can be applied to multiple different versions of the YOLO method, which is universal. Experimental results demonstrate that when the DASSF method is applied to YOLOv8, it achieves a 10.2% and 4.2% improvement in mean Average Precision (mAP) on the VisDrone-2019 and DIOR datasets, respectively, compared to YOLOv8n. This performance surpasses that of current mainstream methods. Additionally, when the DASSF method is integrated into different versions of the YOLO model, the detection performance for aerial images significantly improves compared to the baseline models.
Loading