UAV Video Vehicle Detection: Benchmark and Baseline

Yun Xiao; Jinfa Wang; Zhicheng Zhao; Bo Jiang; Chenglong Li; Jin Tang

UAV Video Vehicle Detection: Benchmark and Baseline

Yun Xiao, Jinfa Wang, Zhicheng Zhao, Bo Jiang, Chenglong Li, Jin Tang

Published: 01 Jan 2025, Last Modified: 20 May 2025IEEE Trans. Geosci. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the increasing application of unmanned aerial vehicles (UAVs) in intelligent transportation systems, vehicle object detection in UAV videos has received increasing attention. Precise categorization and detection for vehicles in UAVs is important in many practical applications. However, existing object detection methods, tailored for natural images, often fall short of accurately identifying vehicle objects. Additionally, high-altitude UAV imaging mainly employs horizontal bounding box annotation, frequently leading to significant obstruction and overlapping. Hence, we propose a new task called UAV video vehicle detection (VVD) to achieve precise detection and categorization of vehicles in high-altitude UAV imaging environments. To facilitate the research and development of UAV VVD, we construct the first large-scale well-annotated benchmark UAV VVD dataset, which includes 70 UAV videos captured at a 500-m altitude, with 361489 vehicle instances annotated by the oriented bounding boxes and vehicle categories. Moreover, we introduce a novel category refinement network (CRNet) approach that extracts and refines vehicle object features from the bounding box of the detection results to classify vehicle categories. This approach effectively eliminates the interference of the background and other vehicle objects in candidate boxes. Notably, the vehicle object features are projected into subspace, enabling the category refinement module (CRM) to focus more on the distinctive characteristics of the vehicle object itself through normalization operations. We conduct extensive experiments on the proposed VVD dataset. Experimental results demonstrate the superiority and effectiveness of the proposed CRNet method. The relevant code and dataset are available at https://github.com/mmic-lcl.

Loading