PCAF: UAV scenarios detector via pyramid converge-and-assign fusion network

Published: 01 Jan 2025, Last Modified: 01 Aug 2025Multim. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Unmanned Aerial Vehicles are widely used in military surveillance, rescue operations, traffic monitoring, and other fields due to their excellent flexibility, low cost, and autonomous flight technology. However, due to factors such as flight altitude and shooting angle, object pixels in aerial images are few, dense, and complex, resulting in an unsatisfactory object detection effect. In this study, we propose a Pyramid Converge-and-Assign Fusion network (PCAF), which utilizes a pyramid strategy to fuse the multiscale feature maps extracted from the backbone network layer by layer from the top-down. Then, the pyramid-fused information is interacted with and fused across scales through the Converge-and-Assign fusion mechanism. First, the CSP-ELAN backbone is used as a feature extractor to extract multiple feature maps of different scales from the input image. Second, pyramid fuses the feature maps extracted from the CSP-ELAN backbone by layer from the top-down. Then, the feature maps of different scales are directly connected through Converge-and-Assign fusion to realize information exchange. Finally, focaler-IoU is used to focus on different regression samples. Six detection heads obtain the final prediction results. Our model performs excellent air-to-ground image detection tasks and highlights its stability and accuracy in complex scenes. Experimental data show that PCAF achieves 32.0% mAP50:95 on VisDrone and surpasses the baseline model YOLOv8 by 4.8%. In the RGB/Infrared mode of DroneVehicle, the mAP50:95 is 57.3%/61.5%, which is 3.1%/1.7% higher than the baseline.
Loading