Abstract: Recent cost volume pyramid based deep neural networks have unlocked the potential of efficiently leveraging
high-resolution images for depth inference from multi-view
stereo. In general, those approaches assume that the depth
of each pixel follows a unimodal distribution. Boundary
pixels usually follow a multi-modal distribution as they represent different depths; Therefore, the assumption results in
an erroneous depth prediction at the coarser level of the
cost volume pyramid and can not be corrected in the refinement levels leading to wrong depth predictions. In contrast,
we propose constructing the cost volume by non-parametric
depth distribution modeling to handle pixels with unimodal
and multi-modal distributions. Our approach outputs multiple depth hypotheses at the coarser level to avoid errors
in the early stage. As we perform local search around
these multiple hypotheses in subsequent levels, our approach does not maintain the rigid depth spatial ordering
and, therefore, we introduce a sparse cost aggregation network to derive information within each volume. We evaluate
our approach extensively on two benchmark datasets: DTU
and Tanks & Temples. Our experimental results show that
our model outperforms existing methods by a large margin
and achieves superior performance on boundary regions.
0 Replies
Loading