Motion Dreamer: Realizing Physically Coherent Video Generation through Scene-Aware Motion Reasoning

08 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: motion coherent video generation
Abstract: Current video generation models often fail to produce logically and physically coherent future scenarios, a critical weakness for applications in autonomous driving and robotics. This stems from a fundamental conflict in end-to-end training: the pursuit of perceptual fidelity diverts capacity from modeling long-range temporal structure, while architectural priors fail to enforce physical laws. We introduce Motion Dreamer, a two-stage framework that resolves this conflict by explicitly decoupling motion reasoning from visual synthesis. Our approach is designed to generate complex scenes from an initial frame and sparse motion cues. To achieve this, we introduce instance flow, a novel sparse-to-dense motion representation, and a motion inpainting training strategy. Together, these techniques allow the model to robustly infer a complete, coherent motion field from partial inputs. This motion-aware representation then guides a synthesis model to generate high-fidelity video grounded in plausible dynamics. Across extensive experiments on robotics, physics, and a large-scale driving dataset, Motion Dreamer significantly outperforms leading methods in both motion coherence and visual realism.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 2962
Loading