CDNet: object detection based on cross-level aggregation and deformable attention for UAV aerial images

Tianxiang Huo, Zhenqi Liu, Shichao Zhang, Jiening Wu, Rui Yuan, Shukai Duan, Lidan Wang

Published: 01 Jan 2025, Last Modified: 15 Oct 2025Vis. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Object detection in unmanned aerial vehicle (UAV) imagery is a crucial task. However, the presence of densely populated small objects and significant scale and shape variations among objects in aerial images pose challenges for the detection task. To address these issues, this paper proposes a cross-level deformable feature aggregation network (CDNet). First, a high-resolution characterization enhancement with deep reduction (HCEDR) structure is designed to extract small object location details in high resolution while reducing redundant deep interference. Furthermore, a cross-level fusiform feature aggregation (CFFA) structure is proposed to fuse multi-scale cross-level feature information and dense small object spatial detail information. Moreover, to address the challenge of object shape variations caused by varying aerial viewpoints, a deformable attention bottleneck (DAB) module is designed to enhance the model’s boundary sensitivity for irregularly shaped objects in aerial scenes. Finally, a new bounding box loss function (inner-WIoU) is proposed, which not only mitigates the detrimental gradient contributions from extreme samples, but also adjusts the auxiliary bounding box dimensions to better fit the ground-truth object bounding boxes, consequently enhancing the model’s performance. To validate the model’s superiority, extensive experiments were conducted on the VisDrone2021 and TinyPerson datasets, achieving with mAP\(_{50}\) improvements of 10.8% and 3.6%, respectively, compared to baseline methods. Compared to other advanced methods, CDNet achieves superior detection performance. The code is available at https://github.com/htxhuo/CDNet.