Abstract: Multiplane Image (MPI) is a volumetric scene representation method that uses multiple layers of texture (RGB) and alpha (A) planes. It offers high deployability due to its compatibility with standard codecs and low rendering complexity. While existing MPI methods have shown promising results, they are constrained by either a limited range of pose span or necessitates transmitting multiple MPIs. In this paper, we present a novel framework to generate an efficient single MPI that integrates the information from multiple views. The key idea is to utilize the surface opacity estimates (A) to locate and retrieve occluded RGB pixels from other camera views that share matching depth, which we call Occlusion Guided Residuals (OGR). Additionally, we introduce an inter-layer texture filler, which is a learned RGB texture on intermediate depth between MPI layers to deal with scenes with continuous depth with a limited number of MPI layers. We composite MPI using the aforementioned RGB textures and refine alpha layers through training with multiview rendering supervision. Thus, through iterations of training, we jointly optimize scene opacity (A) and textures (RGB) leading to an accurate MPI representation. Experiments on various multiview image and video datasets demonstrate that the proposed method achieves state-of-the-art performances with data efficiency. Notably, with just 16 layers, the proposed method attains performance on par with other methods that use twice the layer number or fuse multiple MPIs.
Loading