TL;DR: SceneDirector bridges explicit geometric guidance with generative priors to enable unified, high-fidelity editing of both objects and ego-trajectories in driving videos within a single inference pass.
Abstract: Validating autonomous driving systems requires diverse scenarios, yet real-world data collection is biased and costly. Editing existing driving logs offers a scalable solution, but simultaneously editing objects and ego-trajectory—termed unified editing—remains challenging.
Current methods face an inherent dilemma: generative flexibility for object editing and physical precision for trajectory control.
To address this, we introduce SceneDirector, a diffusion-based framework that bridges explicit geometry and generative priors.
For explicit geometry, we leverage LiDAR-guided depth completion to construct dense scene geometry and integrate editable 3D assets to form a Unified Geometric Scaffold, providing rigorous structural guidance for unified editing.
To leverage generative priors, we encode the source video into a Static Texture Bank to provide rich appearance context.
Our proposed Mask-Gated Reference Attention bridges these modalities. Guided by a geometric uncertainty metric, this mechanism dynamically regulates the interaction between the scaffold and the bank—preserving reliable geometry while adaptively injecting textures for semantic refinement.
Extensive evaluations demonstrate that SceneDirector outperforms state-of-the-art methods in both controllability and visual quality.
Lay Summary: Autonomous driving systems need to be tested in many diverse and challenging situations before they can be safely deployed. However, collecting real-world driving data for rare or dangerous events is expensive, time-consuming, and sometimes unsafe. This paper presents SceneDirector, a method for creating realistic edited driving videos from existing driving recordings.
SceneDirector allows users to modify a driving scene in several ways, such as adding, removing, replacing, or moving vehicles, while also changing the path of the ego vehicle. This makes it possible to create new driving scenarios without having to collect them in the real world. The key idea is to combine reliable 3D scene structure with the ability of modern generative AI models to produce realistic visual details. As a result, the edited videos can both follow the intended spatial layout and look natural.
Experiments show that SceneDirector produces more controllable and visually realistic results than previous methods. We believe this work can help researchers build richer simulation data for testing autonomous driving systems, especially for rare or safety-critical cases.
Originally Submitted Supplementary Material: zip
Primary Area: Applications->Computer Vision
Keywords: Diffusion Models, Video Editing, Autonomous Driving, Ego-trajectory Editing, Object Editing
Originally Submitted PDF: pdf
Submission Number: 427
Loading