DeDiff-4DGS: Fusing Temporal Correlations and Diffusion Priors for Dynamic 3D Scenes

Nguyen Hoang Nguyen, Wei Xiang, Kang Han, Lu Yu, Phu Lai, Tianyu Chen, Yi-Ping Phoebe Chen

Published: 15 Mar 2026, Last Modified: 06 May 2026OpenReview Archive Direct UploadEveryoneRevisionsCC BY 4.0

Abstract: Reconstructing dynamic 3D (4D) scenes is challenging due to complex temporal dynamics and viewpoint sparsity in monocular videos. Existing extensions of 3D Gaussian Splatting (3D-GS) with its temporal modeling often fail to capture temporal correlations across frames, leading to redundant 3D Gaussians and reduced efficiency. To address this limitation, we propose DeDiff-4DGS, a framework that integrates temporal correlations and diffusion priors through two novel modules. The Temporal 3D Gaussian Latent Fusion (T3DLF) module fuses temporal information from sparse reference frames to promote spatio-temporal coherence and reduce the number of required 3D Gaussians. The Latent Diffusion Converter for 3D Gaussians (LDC3D) module enriches reference frames with semantic priors, complementing T3DLF under sparse-view conditions. Experimental results on standard benchmarks demonstrate that DeDiff-4DGS delivers higher reconstruction quality and improved efficiency over current state-of-the-art approaches.