Abstract: Highlights•A high accuracy and geometry-consistent confidence prediction network that globally fuses both spatial coherence and cross-view consistency.•We propose to use a novel tri-modal input to the confidence prediction network, where the normal map produced from noisy depth map via crossproduct is used as the spatial feature and TSDF is used as the cross-view feature.•We present an LSTM-based confidence refinement module to exploit the dependencies of deep features across recursive stages, giving rise to superior confidence prediction performance.•Exhaustive experiments on GTAV synthetic MVS dataset and ETH3D MVS dataset as well as KITTI stereo matching datasets demonstrate that our method can achieve significantly better performance than competing methods.
Loading