A Small Object Detection Framework on UAV Images via Attentive Representation Learning and Attentional Feature Fusion

Published: 2025, Last Modified: 13 Jan 2026Signal Image Video Process. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Object detection in Unmanned Aerial Vehicle (UAV) images has diverse applications, leading to increasing research interest. Despite the success of detection in natural scenes, UAV images pose two unique challenges: the high prevalence of small objects and significant variations in object scales, limiting the performance of existing methods. To address these, we propose Att-YOLO, a novel small object detection model for UAV images, which improves YOLOv7 with attentive learning and attentional fusion. First, during feature extraction, we introduce an attentive representation learning module with a spatial attention mechanism to highlight foreground features and a channel attention module to reduce background noise. Second, we design an attentional feature fusion strategy to leverage multi-scale feature correlations, assigning dynamic weights to better integrate cross-layer information, which is crucial for handling scale variations. Third, to improve small object detection, we extend the Generalized Efficient Layer Aggregation Network (GELAN) with Swin Transformer blocks, enabling the model to capture both local and global features effectively. Additionally, Wise-IoU (WIoU) v3 is used as the bounding box regression loss to improve localization precision. Extensive experiments show that Att-YOLO achieves competitive performance with state-of-the-art methods, achieving a mean Average Precision (mAP) of 41.8% on the VisDrone2019 dataset and 28.2% on the UAVDT dataset, while maintaining superior AP\(_{\text {50}}\) and showing advantageous computational efficiency.
Loading