Skybound Magic: Enabling Body-Only Drone Piloting Through a Lightweight Vision–Pose Interaction Framework

Zukang Yang

Published: 28 Aug 2025, Last Modified: 31 Aug 2025International Journal of Human–Computer InteractionEveryoneCC BY 4.0

Abstract: Natural and safe interaction with small UAVs often requires heavy vision processing or bulky controllers. We present an edge-only framework enabling intuitive whole-body gesture control. A YOLOv8-Nano detector isolates the user, followed by a MobileNet-based PoseNet extracting 17 skeletal keypoints. A lightweight rule-based classifier maps eight canonical poses to flight commands (e.g., take-off, shift, hover, land, emergency stop). A multimodal feedback loop—skeletal overlay, LED flashes, and synthesized voice—supports real-time self-correction. Implemented on a Jetson Nano, the system runs at ∼30 FPS, achieving 91.8 % gesture accuracy with 101 ms latency and low power use. In tests with 168 participants, the interface scored 4.6/5 in satisfaction, with high ratings from security and emergency operators. Results show robust, natural human–drone interaction is feasible without off-board computation, enabling field-ready aerial robots.