Ctrl-V: Higher Fidelity Autonomous Vehicle Video Generation with Bounding-Box Controlled Object Motion
Abstract: Controllable video generation has attracted significant attention, largely due to advances in video diffusion models. In domains such as autonomous driving, developing highly accurate predictions for object motions is essential. This paper addresses the key challenge of enabling fine-grained control over object motion in the context of driving video synthesis. To accomplish this, we 1) employ a distinct, specialized model to forecast the trajectories of object bounding boxes, 2) adapt and enhance a separate video diffusion network to create video content conditioned on these high-quality trajectory forecasts, and 3) we are able to exert precise control over object position/movements using bounding boxes in both 2D and 3D spaces. Our method, Ctrl-V, leverages modified and fine-tuned Stable Video Diffusion (SVD) models to solve both trajectory and video generation. Extensive experiments conducted on the KITTI, Virtual-KITTI 2, BDD100k, and nuScenes datasets validate the effectiveness of our approach in producing realistic and controllable video generation. Anonymous project page: \url{https://aprudentmouse.github.io/ctrlv.github.io/}
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=35KCGjoWEs
Changes Since Last Submission: - **Updated the title to emphasize the paper's focus on autonomous driving scenarios.**
- **Refined the abstract and conclusion to emphasize the focus on autonomous driving.**
- Enhanced the introduction by delving deeper into the motivation, emphasizing the critical importance of object movement controllability and its pivotal role in advancing autonomous driving technologies.
- Revised the methodology section for improved clarity and readability.
- Added missing citations.
- Corrected a typo in the model diagram (Figure 2).
- Added labeling to Figure 3.
- Updated the project link.
- Expanded the discussion on limitations and failure cases in Appendix G.
- Included a concise discussion of the limitations of Ctrl-V within the main text.
Assigned Action Editor: ~Charles_Xu1
Submission Number: 3911
Loading