Intuitive UAV Operation: A Novel Dataset and Benchmark for Multi-Distance Gesture Recognition

Published: 01 Jan 2024, Last Modified: 23 Apr 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: UAV gesture recognition, a novel human-computer interaction form, offers an intuitive approach to controlling UAVs in various environments. However, there is a lack of comprehensive datasets for AI-powered UAV gesture recognition. This paper contributes in several ways: (i) We introduce MD-UHGRD, a unique UAV static gesture dataset with 20, 000 images and annotations, collected from a diverse group of participants in different environmental conditions. This dataset is expected to bridge a significant gap in UAV gesture recognition algorithms. (ii) We propose SA-YOLO, a multifunctional UAV gesture recognition method that not only enables gesture recognition but also includes face and pedestrian tracking, optimizing UAV control in complex scenarios. SA-YOLO incorporates the Spatial Asymptotic Feature Pyramid Network (SAFPN), Scale Pyramid Pooling with Cross Stage Partial Networks Convolution (SPPCSPC), and Space-to-Depth Convolution (SPD-Conv). (iii) Extensive evaluation of SAYOLO on MD-UHGRD establishes it as a benchmark in this domain. Our method demonstrates high accuracy, processing speed, and a compact model size, achieving a 93.2% mean Average Precision (mAP) with 10.3 million parameters and 48 frames per second (FPS). Among competing models, SA-YOLO not only achieves the highest mAP but also maintains a balance in model size and FPS. The database and code are available at: https://github.com/ijcnn2024/SA-YOLO.
Loading