SplitGaussian: Reconstructing Dynamic Scenes via Visual Geometry Decomposition

Jiahui Li; Shengeng Tang; Jingxuan He; Gang Huang; Zhangye Wang; Yantao Pan; Lechao Cheng

SplitGaussian: Reconstructing Dynamic Scenes via Visual Geometry Decomposition

Jiahui Li, Shengeng Tang, Jingxuan He, Gang Huang, Zhangye Wang, Yantao Pan, Lechao Cheng

16 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dynamic Reconstruction, 3D Gaussian Splatting

Abstract: Reconstructing dynamic 3D scenes from monocular videos remains a fundamentally challenging problem due to the presence of non-rigid motion, occlusion, appearance variation, and the absence of direct depth supervision. While neural radiance fields (NeRFs) have achieved remarkable results in static scene reconstruction, their computational inefficiency and per-scene optimization make them less practical for large-scale or real-time dynamic applications. Recent advances in 3D Gaussian Splatting (3DGS) provide a more efficient alternative, offering real-time rendering and faster convergence. However, existing 3DGS-based methods typically employ a unified deformation field to model both static and dynamic elements, often leading to motion bleeding, geometric artifacts, and temporal instability. In this paper, we propose SplitGaussian, a novel framework for monocular dynamic scene reconstruction that explicitly separates static and dynamic components within the 3DGS paradigm. By decoupling the learning of deformation for moving and non-moving regions, our method mitigates interference between motion modeling and static geometry preservation. We introduce independent deformation networks for each component, enabling precise motion representation while maintaining the integrity of static regions. Furthermore, to improve rendering quality and training stability, we propose a render-frequency-aware pruning strategy that filters out unreliable or redundant Gaussians with minimal visual contribution. Experiments on complex dynamic scenes demonstrate that our method achieves superior visual fidelity, temporal consistency, and training stability compared to recent baselines. Our approach represents a step toward scalable and artifact-free dynamic reconstruction using Gaussian-based representations.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 6499

Loading