DroneGPT: Zero-shot Video Question Answering For Drones

Hongjie Qiu, Jinqiang Li, Junhao Gan, Shuwen Zheng, Liqi Yan

Published: 2024, Last Modified: 19 Feb 2025CVDL 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the continuous development and popularization of drone technology, drones are widely used in various fields, especially in drone video applications. We propose DroneGPT, a neural-symbolic method that learns VISPROG, which does not require any task-specific training. It leverages the contextual learning ability of large language models to generate and execute modular programs, solving complex and compositional drone vision tasks given natural language instructions. The modules in the program can call several ready-made computer vision models to achieve object detection, or write image processing programs by themselves, and finally connect them to achieve drone video question answering. We believe that DroneGPT can expand the task scope of drones in the video field and further enrich the functions of contemporary drones.