DeepV2D: Video to Depth with Differentiable Structure from Motion

Zachary Teed; Jia Deng

DeepV2D: Video to Depth with Differentiable Structure from Motion

Zachary Teed, Jia Deng

Published: 20 Dec 2019, Last Modified: 22 Jun 2025ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: DeepV2D predicts depth from a video clip by composing elements of classical SfM into a fully differentiable network.

Abstract: We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth.

Keywords: Structure-from-Motion, Video to Depth, Dense Depth Estimation

Code: [![github](/images/github_icon.svg) princeton-vl/DeepV2D](https://github.com/princeton-vl/DeepV2D)

Data: [SUN3D](https://paperswithcode.com/dataset/sun3d), [ScanNet](https://paperswithcode.com/dataset/scannet)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/deepv2d-video-to-depth-with-differentiable/code)

Original Pdf: pdf

9 Replies

Loading