Abstract: Recently, volumetric video coding based on neural radiance fields has gained significant attention for storing and transmitting three-dimensional (3D) scenes captured from multi-view video. Because the neural networks are trained to produce novel view synthesis of surrounding 3D scenes, compressing the model and then rendering the colors and geometry through the decompressed model can be utilized as a 3D video coding system. However, although this approach provides superior performance compared to conventional 3D video coding standards using depth video, challenges remain in reducing overall model sizes to improve coding efficiency. In this paper, we propose a novel dynamic volumetric video coding technique that employs a Group of Volume (GoV) to divide multi-view video sequences into smaller chunks, addressing complex temporal dynamics. Our method uses volumetric video features represented with 3D spatial and temporal tensor matrices and vectors and encodes them with the GoVs. The tensors are compressed by existing 2D video codec, allowing for fast rendering and easing deployment. Experimental results validate that our method not only reduces memory footprint but also maintains high-quality rendering as compared to state-of-the-art studies.
Loading