Keywords: neural surface reconstruction, neural radiance fields, multi-view stereo, 3D reconstruction
TL;DR: We propose a sparse scene representation for generalizable neural surface reconstruction.
Abstract: Generalizable neural surface reconstruction has become a compelling technique at reconstructing 3D scenes from sparse input views without per-scene optimization. In these methods, dense 3D feature volumes has proven very effective as a global scene representation. Unfortunately, this representation severely limits their high-resolution modeling abilities and reconstruction accuracies because memory requirements scale cubically with voxel resolution. In this paper, we propose a novel sparse-representation approach that dramatically improves memory efficiency and allows for more accurate surface reconstructions. Our method employs a two-stage pipeline: We first train a neural network to predict voxel occupancy probabilities from the given posed images, then we restrict feature computation and volume rendering to the sparse voxels with sufficiently high occupancy estimates. To support this sparse representation, we develop specialized algorithms for efficient sampling, feature aggregation, and spatial querying that overcome the dense-volume assumptions of existing approaches. Extensive experiments on standard benchmarks demonstrate that our sparse representation enables scene reconstruction at a $512^3$ resolution, compared to the typical $128^3$ resolution possible with existing methods on similar hardware. We also achieve superior reconstruction accuracy compared to current state-of-the-art approaches. Our work establishes sparse neural representations as a promising direction for scalable, high-quality 3D reconstruction.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7269
Loading