Enhancing Aerial Pedestrian Detection via High-Resolution P2 Feature Integration in YOLOv12

Published: 11 May 2026, Last Modified: 11 May 2026AERO-HPR 2026 PosterEveryoneRevisionsCC BY 4.0
Track: Proceedings Track
Keywords: Pedestrian detection, UAVs, Aerial imagery, Attention mechanisms
Abstract: Detecting pedestrians in UAV (Unmanned Aerial Vehicle) imagery poses several challenges due to factors such as significant scale variation, low resolution, dense crowds, and cluttered backgrounds. In aerial views, pedestrians often occupy only a few pixels, making them difficult to detect using standard object detection architectures that rely on high-level feature maps. Most modern detectors begin predictions at a stride of 8 resolutions, which limits their ability to detect extremely small objects. In this study, we revisit the feature pyramid architecture of YOLOv12 and introduce a high-resolution P2 detection head to improve supervision at the early stages of the network. Our proposed modification extends the pyramid to a stride of 4 resolution and incorporates bidirectional feature refinement to maintain semantic consistency across different scales. This design remains lightweight and preserves practical inference speed while improving the representation of tiny objects. We evaluate our approach on two aerial pedestrian benchmarks: VisDrone and TinyPerson. The proposed model improves the mean Average Precision (mAP) at IoU 0.5 from 0.63 to 0.69, which represents a 9.5% relative gain on the VisDrone dataset. On the TinyPerson dataset, mAP@0.5 increases from 0.40 to 0.45, indicating a 12.5% relative gain. Additionally, there is a 25% relative increase in the tiny-scale AP50, rising from 0.24 to 0.30. The experimental results demonstrate consistent improvements in detection performance, particularly for small and tiny pedestrians, without significant computational overhead. Ablation studies further confirm that early-resolution detection is crucial in enhancing recall for small objects in UAV imagery. These findings indicate that revisiting the starting level of feature pyramids is a straightforward yet effective strategy for improving small-object detection in aerial scenarios.
Supplementary Material: zip
Submission Number: 4
Loading