Deep multi-view stereo for dense 3D reconstruction from monocular endoscopic video

Gwangbin Bae, Ignas Budvytis, Chung-Kwong Yeung, Roberto Cipolla

04 Nov 2022 (modified: 04 Nov 2022)OpenReview Archive Direct UploadReaders: Everyone

Abstract: 3D reconstruction from monocular endoscopic images is a challenging task. State-of-the-art multi-view stereo (MVS) algorithms based on image patch similarity often fail to obtain a dense reconstruction from weakly-textured endoscopic images. In this paper, we present a novel deep-learning-based MVS algorithm that can produce a dense and accurate 3D reconstruction from a monocular endoscopic image sequence. Our method consists of three key steps. Firstly, a number of depth candidates are sampled around the depth prediction made by a pre-trained CNN. Secondly, each candidate is projected to the other images in the sequence, and the matching score is measured using a patch embedding network that maps each image patch into a compact embedding. Finally, the candidate with the highest score is selected for each pixel. Experiments on colonoscopy videos demonstrate that our patch embedding network outperforms zero-normalized cross-correlation and a state-of-the-art stereo matching network in terms of matching accuracy and that our MVS algorithm produces several degrees of magnitude denser reconstruction than the competing methods when same accuracy filtering is applied.

0 Replies