4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time rendering of temporally complex dynamic scenes
Abstract: Reconstructing dynamic scenes from video sequences is a highly promising task in the multimedia domain. While previous methods have made progress, they often struggle with slow rendering and managing temporal complexities such as significant motion and object appearance/disappearance. In this paper, we propose SaRO-GS as a novel dynamic scene representation capable of achieving real-time rendering while effectively handling temporal complexities in dynamic scenes. To address the issue of slow rendering speed, we adopt a Gaussian primitive-based representation and optimize the Gaussians in 4D space, which facilitates real-time rendering with the assistance of 3D Gaussian Splatting. Additionally, to handle temporally complex dynamic scenes, we introduce a Scale-aware Residual Field. This field considers the size information of each Gaussian primitive while encoding its residual feature and aligns with the self-splitting behavior of Gaussian primitives. Furthermore, we propose an Adaptive Optimization Schedule, which assigns different optimization strategies to Gaussian primitives based on their distinct temporal properties, thereby expediting the reconstruction of dynamic regions. Through evaluations on monocular and multi-view datasets, our method has demonstrated state-of-the-art performance.
Primary Subject Area: [Experience] Interactions and Quality of Experience
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: Dynamic scene reconstruction is at the forefront of multimedia technology research and is a core task in many related fields such as VR, AR and the metaverse. It can significantly enhance the user experience of multimedia products by providing high interactivity, allowing users to explore dynamic scenes from any time, viewpoint, and location, like free-viewpoint video and bullet-time effects. However, it faces two main challenges in its current application. Firstly, there is a high demand for real-time response in user interactions. However, current NeRF-based methods struggle to achieve real-time rendering. Secondly, the quality of rendering affects the user experience. Rendering quality in temporally complex scenes still requires improvement.
In this paper, we introduce SaRO-GS, a novel 4D scene representation method, to tackle both challenges. To address slow rendering, SaRO-GS uses 4D Gaussians and the fast differentiable rasterizer from 3DGS for real-time rendering.To enhance rendering quality, we introduce a Scale-aware Residual Field, considering Gaussian size information during encoding for more accurate features. We also propose an Adaptive Optimization strategy, assigning unique schedules to each Gaussian based on temporal characteristics. Evaluation on various datasets shows that our SaRO-GS achieves high-quality reconstruction of temporally complex dynamic scenes in real-time, crucially enhancing user interaction experience.
Supplementary Material: zip
Submission Number: 4096
Loading