Deep Unsupervised Learning for Simultaneous Visual Odometry and Depth Estimation

Yawen Lu, Guoyu Lu

Published: 2019, Last Modified: 11 Jun 2024ICIP 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Visual odometry and depth estimation are critical to understanding the scene and camera motion, which are particularly helpful to tasks such as scene understanding, autonomous driving, and robotics. Supervised learning methods have been applied in many deep neural network frameworks and demonstrated outstanding results in visual odometry and depth estimation. However, supervised learning requires a significant amount of labeled data for training, which consumes extensive time. In this paper, we explore an unsupervised learning framework that can learn a camera pose regressor from monocular video frames and estimates the scene depth simultaneously. The proposed method is able to perform accurate pose prediction as well as depth estimation, despite the absence of any ground truth data. The effectiveness of our proposed method is demonstrated through experiments on KITTI, Cityscapes, and Make3D benchmark datasets, which shows superb results compared with state-of-the-art methods in both tasks.