Keywords: Gaussian Splatting; Motion Editing; Monocular Reconstruction
Abstract: Motion is one of the key components in deformable 3D scenes. Generative video models allow users to animate static scenes with text prompts for novel motion, but when it comes to 4D reconstruction, such reanimations often fall apart. The generated videos often suffer from geometric artifacts, implausible motion, and occlusions, which hinder physically consistent 4D reanimation. In this work, we introduce \textbf{Restage4D}, a geometry-preserving pipeline for deformable scene reconstruction from a single edited video. Our key insight is to leverage the unedited original video as an additional source of supervision, allowing the model to propagate accurate structure into occluded and disoccluded regions.
To achieve this, we propose a video-rewinding training scheme that temporally bridges the edited and original sequences via a shared motion representation. We further introduce an occlusion-aware ARAP regularization to preserve local rigidity, and a disocclusion backtracing mechanism that supplements missing geometry in the canonical space. Together, these components enable robust reconstruction even when the edited input contains hallucinated content or inconsistent motion.
We validate Restage4D on DAVIS and PointOdyssey, demonstrating improved geometry consistency, motion quality, and 3D tracking performance. Our method not only preserves deformable structure under novel motion, but also automatically corrects errors introduced by generative models, bridging the gap between flexible video synthesis and physically grounded 4D reconstruction.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 337
Loading