4D Object-Mover: Distilling Pretrained Diffusion Priors for Object Animation

Tuan Anh Tran, Jan Eric Lenssen, Gerard Pons-Moll, Julian Chibane

Published: 11 Jun 2025, Last Modified: 07 Aug 2025CVPR 2025 Workshop on 4D Vision: Modeling the Dynamic WorldEveryoneRevisionsCC BY 4.0

Abstract: Recent advances in dynamic scene reconstruction using Neural Radiance Fields (NeRFs) and Gaussian Splatting (GS) have created a demand for effective 4D editing tools. While existing methods primarily focus on appearance alterations or object removal, the challenge of adding objects to 4D scenes, which requires understanding of objects’ interactions with the original scene, remains largely unexplored. We present a novel approach to address this gap, focusing on generating plausible motion for newly added objects in 4D scenes. Our key finding is that 2D image-based diffusion models carry strong scene interaction priors that can be extracted from a static scene-object frame and propagated to novel frames of a dynamic 3D scene. Concretely, our method takes an object and its initial placement in a single frame as input, aiming to generate its position and orientation throughout the entire sequence. We first capture the object’s appearance, shape, and interaction with the original scene from the static edited frame via fine-tuning a 2D diffusion-based editor. Building on this, we propose an iterative algorithm that leverages the fine-tuned diffusion model to generate frame-to-frame motion for the new object. We show that our method significantly improves 4D motion generation for the new objects compared to prior works on the diverse D-NeRF scene dataset.