Abstract: With the continuous development and popularization of drone technology, drones are widely used in various fields, especially in drone video applications. We propose DroneGPT, a neural-symbolic method that learns VISPROG, which does not require any task-specific training. It leverages the contextual learning ability of large language models to generate and execute modular programs, solving complex and compositional drone vision tasks given natural language instructions. The modules in the program can call several ready-made computer vision models to achieve object detection, or write image processing programs by themselves, and finally connect them to achieve drone video question answering. We believe that DroneGPT can expand the task scope of drones in the video field and further enrich the functions of contemporary drones.
Loading