Abstract: Recent supervised multi-view depth estimation networks
have achieved promising results. Similar to all supervised approaches, these networks require ground-truth data
during training. However, collecting a large amount of
multi-view depth data is very challenging. Here, we propose a self-supervised learning framework for multi-view
stereo that exploit pseudo labels from the input data. We
start by learning to estimate depth maps as initial pseudo
labels under an unsupervised learning framework relying on image reconstruction loss as supervision. We
then refine the initial pseudo labels using a carefully designed pipeline leveraging depth information inferred from
a higher resolution image and neighboring views. We use
these high-quality pseudo labels as the supervision signal to train the network and improve, iteratively, its performance by self-training. Extensive experiments on the
DTU dataset show that our proposed self-supervised learning framework outperforms existing unsupervised multiview stereo networks by a large margin and performs
on par compared to the supervised counterpart. Code
is available at https://github.com/JiayuYANG/
Self-supervised-CVP-MVSNet.
0 Replies
Loading