DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images

Ya-ling Li, Yong Feng, Mingliang Zhou, Xiancai Xiong, Yongheng Wang, Baohua Qiang

Published: 01 Jan 2024, Last Modified: 11 Apr 2025Vis. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Unmanned aerial vehicles are increasingly popular due to their ease of operation, low noise, and portability. However, existing object detection methods perform poorly in detecting small targets in densely arranged, sparsely distributed aerial images. To tackle this issue, we enhanced the general object detection method YOLOv5 and introduced a multi-scale detection method called Detach-Merge Attention YOLO (DMA-YOLO). Specifically, we proposed a Detach-Merge Convolution (DMC) module and embedded it into the backbone network to maximize feature retention. Furthermore, we embedded the Bottleneck Attention Module (BAM) into the detection head to suppress interference from complex background information without significantly increasing computational complexity. To represent and process multi-scale features more effectively, we have integrated an extra detection head and enhanced the neck network into the Bi-directional Feature Pyramid Network (BiFPN) structure. Finally, we adopted the SCYLLA-IoU (SIoU) as a loss function to expedite the convergence rate of our model and enhance the precision of detection results. A series of experiments on the VisDrone2019 and UAVDT datasets have illustrated the effectiveness of DMA-YOLO. Code is available at https://github.com/Yaling-Li/DMA-YOLO.