Learning to Fly Camera Drones by Watching Internet Videos

02 Sept 2025 (modified: 01 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: camera drone motion planning; AI videography
Abstract: Camera drones offer unique perspectives and dynamic motions, yet automating their control for drone videography remains an open question. Unlike navigation or racing, there is no well-accepted reward function for human viewing experiences, making reinforcement learning approaches ill-suited. Therefore, we propose an imitation learning pipeline that learns from Internet videos by mimicking expert operations. In the absence of teleoperation data such as controller inputs and flight logs, we use reconstructed 3D camera poses to estimate camera drone trajectories. Importantly, to ensure data quality, we develop a scalable filtering scheme based on trajectory smoothness. After discarding more than three quarters of processed data, we produce 99k high-quality trajectories, making it the largest camera drone motion dataset. To evaluate this new task, we introduce an interactive evaluation environment with 38 natural scenes and 7 real city scans, and benchmark metrics at both the instance and dataset levels. As a minor contribution, we present a strong baseline named DVGFormer. Despite architectural simplicity, the proposed approach can reproduce complex cinematic behaviors such as obstacle‑aware weaving, scenic reveals, and orbiting shots, verifying the effectiveness of the proposed imitation learning formulation. Data and code are available.
Primary Area: learning on time series and dynamical systems
Submission Number: 707
Loading