Abstract: Novel-view synthesis with sparse input views is important for practical applications such as AR/VR and autonomous driving. Many works in this field have already integrated depth information into NeRF, utilizing depth priors for assistance in geometric and spatial understanding. However, most existing work tends to either overlook the inaccuracies in depth maps or only handle them roughly, limiting the effectiveness of the synthesis. To address this issue, we propose a depthguided robust point cloud fusion NeRF for sparse input synthesis. We first construct a point cloud for each input view, with a novel point cloud representation based on learnable matrices and vectors. Then, through an additional lightweight scene fusion network, we fuse the point clouds from each input view to build a point cloud of the entire scene. By optimizing the point cloud representation and scene fusion network, inaccuracies in the depth map can be adjusted and refined, thereby achieving a more precise perception of the overall scene. Each voxel in the scene is determined by referencing the fused point cloud to establish its density and appearance. Experimental results demonstrate that our method outperforms state-of-the-art baselines.
Loading