Keywords: Gaussian Splatting, Diffusion Models, Generative Dynamics
TL;DR: Text-driven animation of 3D Gaussian Splatting scenes based on video diffusion models and an effective approach for lifting 2D videos to realistic 3D motion.
Abstract: State-of-the-art novel view synthesis methods achieve impressive results for multi-view captures of static 3D scenes. However, the reconstructed scenes still lack “liveliness,” a key component for creating engaging 3D experiences. Recently, novel video diffusion models generate realistic videos with complex motion and enable animations of 2D images, however they cannot naively be used to animate 3D scenes as they lack multi-view consistency. To breathe life into the static world, we propose Gaussians2Life, a method for animating parts of high-quality 3D scenes in a Gaussian Splatting representation. Our key idea is to leverage powerful video diffusion models as the generative component of our model and to combine these with a robust technique to lift 2D videos into meaningful 3D motion. We find that, in contrast to prior work, this enables realistic animations of complex, pre-existing 3D scenes in a robust manner and further enables the animation of a large variety of object classes, while related work is mostly focused on prior-based character animation, or single 3D objects due to biases of video diffusion models. Our model can readily be used to create immersive and engaging 3D experiences for arbitrary scenes in a consistent manner.
Supplementary Material: pdf
Submission Number: 57
Loading