Global-Local Fusion With Semantic Information Guidance for Accurate Small Object Detection in UAV Aerial Images

Published: 01 Jan 2025, Last Modified: 30 Jul 2025IEEE Trans. Geosci. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, the rapid development of the unmanned aerial vehicle (UAV) technology has generated a large number of aerial photography images captured by UAV. Consequently, the object detection in UAV aerial images has emerged as a recent research focus. However, due to the flexible flight heights and diverse shooting angles of UAV, two significant challenges have arisen in UAV aerial images: extreme variation in target scale and the presence of numerous small targets. To address these challenges, this article introduces a semantic information-guided fusion module specifically tailored for small targets. This module utilizes high-level semantic information to guide and align the underlying texture information, thereby enhancing the semantic representation of small targets at the feature level and subsequently improving the model’s ability to detect them. In addition, this article introduces a novel global–local fusion detection strategy to strengthen the detection of small targets. We have redesigned the foreground region assembly method to address the drawbacks of previous methods that involved multiple inferences. Extensive experiments conducted on the VisDrone and UAVDT datasets demonstrate that our two self-designed modules can significantly enhance the detection capability of small targets compared with the YOLOX-M model. Our code is publicly available at: https://github.com/LearnYZZ/GLSDet.
Loading