Abstract: We propose a cost volume-based neural network for depth inference from multi-view images. We demonstrate that building a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">cost volume pyramid</i> in a coarse-to-fine manner instead of constructing a cost volume at a fixed resolution leads to a compact, lightweight network and allows us inferring high resolution depth maps to achieve better reconstruction results. To this end, we first build a cost volume based on uniform sampling of fronto-parallel planes across the entire depth range at the coarsest resolution of an image. Then, given current depth estimate, we construct new cost volumes iteratively to perform depth map refinement. We show that working on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">cost volume pyramid</i> can lead to a more compact, yet efficient network structure compared with existing works. We further show that the (residual) depth sampling can be fully determined by analytical geometric derivation, which serves as a principle for building compact <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">cost volume pyramid</i> . To demonstrate the effectiveness of our proposed framework, we extend our <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">cost volume pyramid</i> structure to handle the unsupervised depth inference scenario. Experimental results on benchmark datasets show that our model can perform 6 <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">x</i> faster with similar performance as state-of-the-art methods for supervised scenario and demonstrates superior performance on unsupervised scenario. Code is available at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/JiayuYANG/CVP-MVSNet</uri> .
0 Replies
Loading