3DPhysVideo: 3D Scene Reconstruction and Physical Animation Leveraging a Video Generation Model via Consistency-Guided Flow SDE

ICLR 2026 Conference Submission1309 Authors

03 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Video Generation, 3D Reconstruction, Physically Plausible Video
TL;DR: .
Abstract: Video generative models have made remarkable progress, yet they often yield visual artifacts that violate grounding in real-world physical dynamics. Recent works such as PhysGen3D tackle single image-to-3D physics through mesh reconstruction and Physically-Based Rendering, but challenges remain in modeling fluid dynamics and photorealism. This work introduces 3DPhysVideo, a novel training-free pipeline that generates physically realistic videos from a single image. We repurpose an off-the-shelf video model for two stages. First, we use it as a novel view synthesizer to reconstruct complete 360-degree 3D scene geometry by guiding the image-to-video (I2V) flow model with rendered point clouds derived from an initial 3D estimation. Second, after applying Material Point Method (MPM) physics simulation to this geometry, the simulated point cloud is used to guide the same I2V flow model to synthesize final, high-quality videos. Consistency-Guided Flow SDE, which decomposes the predicted velocity of the I2V flow model into denoising and consistency bias, allows us to effectively repurpose the model for both 3D reconstruction and simulation-guided video generation. Our method successfully bridges the gap from single-images to physically plausible videos while remaining efficient to run on a single consumer gpu. In the extensive experiments, our approach outperforms state-of-the-art baselines on both GPT-based evaluations and VideoPhy physics-consistency benchmark, across diverse scenarios including single-object, multi-object, and fluid interaction sequences.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 1309
Loading