Transferring Relative Monocular Depth to Surgical Vision with Temporal Consistency
Abstract: Relative monocular depth, inferring depth up to shift and
scale from a single image, is an active research topic. Recent deep learn-
ing models, trained on large and varied meta-datasets, now provide excel-
lent performance in the domain of natural images. However, few datasets
exist which provide ground truth depth for endoscopic images, making
training such models from scratch unfeasible. This work investigates the
transfer of these models into the surgical domain, and presents an ef-
fective and simple way to improve on standard supervision through the
use of temporal consistency self-supervision. We show temporal consis-
tency significantly improves supervised training alone when transferring
to the low-data regime of endoscopy, and outperforms the prevalent self-
supervision technique for this task. In addition we show our method dras-
tically outperforms the state-of-the-art method from within the domain
of endoscopy. We also release our code, models, and ensembled meta-
dataset, Meta-MED, establishing a strong benchmark for future work.
Loading