R4D-planes: Remapping Planes For Novel View Synthesis and Self-Supervised Decoupling of Monocular Videos

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The view synthesis and decoupling of dynamic objects from the static environment in monocular video are both long-standing challenges in CV and CG. Most of the previous NeRF-based methods rely on implicit representation, which require additional supervision and training time. Later, various explicit representations have been applied to the task of novel view synthesis for dynamic scenes, such as multi-planes or 3D gaussian splatting. They usually encode the dynamics by introducing an additional time dimension or a deformation field. These methods greatly reduce the time consumption, but still fail to achieve high rendering quality in some scenes, especially for some real scenes. For the latter decoupling problem, previous neural radiation field methods require frequent tuning of the relevant parameters for different scenes, which is very inconvenient for practical use. We consider above problems and propose a new representation of dynamic scenes based on tensor decomposition, which we call R4D-planes. The key to our method is remapping, which compensates for the shortcomings of the plane structure by fusing space-time information and remapping to new indexes. Furthermore, we implement a new decoupling structure, which can efficiently decouple dynamic and static scenes in a self-supervised manner. Experimental results show our method achieves better rendering quality and training efficiency in both view synthesis and decoupling tasks for monocular scenes.
Primary Subject Area: [Experience] Art and Culture
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: This work intends to address the shortcomings of the current neural radiation field approach to reconstruct dynamic scenes through monocular video. The newly proposed method in this work enables high-quality dynamic scene rendering, as well as complete decoupling of dynamic and static scenes. Users can record a monocular video of a dynamic scene using a phone and realize the reconstruction and rendering of the scene by the present method, which is of great significance for virtual reality.
Supplementary Material: zip
Submission Number: 3218
Loading