Abstract: We address the task of monocular depth estimation in the multi-domain setting. Given a large dataset (source) with ground-truth depth maps, and a set of unlabeled datasets (targets), our goal is to create a model that works well on unlabeled target datasets across different scenes. This is a challenging problem when there is a significant domain shift, often resulting in poor performance on the target datasets. We propose to address this task with a unified approach that includes adversarial knowledge distillation and uncertainty-guided self-supervised reconstruction. We provide both quantitative and qualitative evaluations on four datasets: KITTI, Virtual KITTI, UAVid China, and UAVid Germany. These datasets contain widely varying viewpoints, including ground-level and overhead perspectives, which is more challenging than is typically considered in prior work on domain adaptation for single-image depth. Our approach significantly improves upon conventional domain adaptation baselines and does not require additional memory as the number of target sets increases.
Loading