Abstract: Novel view synthesis (NVS) aims to synthesize photo-realistic images depicting a scene by utilizing existing source images. The core objective is that the synthesized images are supposed to be as close as possible to the scene content. In recent years, various approaches shift the focus towards the visual effect of images in continuous space or time. While current methods for static scenes treat the rendering of images as isolated processes, neglecting the geometric consistency in static scenes. This usually results in incoherent visual experiences like flicker or artifacts in synthesized image sequences. To address this limitation, we propose Multi-View Consistency View Synthesis (MCVS). MCVS leverages long short-term memory (LSTM) and self-attention mechanism to model the spatial correlation between synthesized images, hence forcing them closer to the ground truth. MCVS not only enhances multi-view consistency but also improves the overall quality of the synthesized images. The proposed method is evaluated on the Tanks and Temples dataset, and the FVS dataset. On average, the Learned Perceptual Image Patch Similarity (LPIPS) is better than state-of-the-art approaches by 0.14 to 0.16%, indicating the superiority of our approach.
Loading