Abstract: Although Neural Radiance Fields (NeRF) has achieved impressive 3D reconstruction with dense view images, its performance degrades significantly when the training views are sparse. We observe that under the sparse view setting, it is important to learn the correspondence of pixels among different views, i.e., the 3D consistency, to improve the reconstruction quality. To achieve this, we first propose the Hard-Mask that utilizes the depth information to locate pixels with correspondence relationship and then assigns higher loss weights on these pixels. The key idea is to achieve pixel-wise differentiated optimization of NeRF based on the 3D consistency among target views and source views instead of treating each pixel equally. This optimization strategy helps NeRF-based algorithms to learn fine-grained object details with limited data. To deal with the absence of accurate depth information, the Soft-Mask is proposed to estimate the correspondence relationship based on the trend of training losses. Our proposed method can serve as a plug-in component for existing NeRF-based view-synthesis models. Extensive experiments on recent representative works, including NeRF, IBRNet and MVSNeRF, show that our method can significantly improve the model performance under sparse view conditions (e.g., up to 70\% improvement in PSNR on DTU dataset).
