SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior

Published: 01 Jan 2025, Last Modified: 15 May 2025WACV 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Novel View Synthesis (NVS) for street scenes plays a critical role in the autonomous driving simulation. Current mainstream methods, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints. This issue stems from the sparse training views captured by a fixed camera on a moving vehicle. To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data. Specifically, we first fine-tune a Diffusion Model by adding images from adjacent frames as condition, meanwhile exploiting depth data from LiDAR point clouds to supply additional spatial information. Then we apply the fine-tuned Diffusion Model to regularize the 3DGS at unseen views during training. Experimental results validate the effectiveness of our method compared with current state-of-the-art models, and demonstrate its advance in rendering images from broader views.
Loading