Self-ensembling depth completion via density-aware consistency

Xuanmeng Zhang, Zhedong Zheng, Minyue Jiang, Xiaoqing Ye

Published: 01 Oct 2024, Last Modified: 01 Oct 2024Pattern RecognitionEveryoneCC BY-SA 4.0

Abstract: Depth completion can predict a dense depth map by taking a sparse depth map and the aligned RGB image as input, but the acquisition of ground truth annotations is labor-intensive and non-scalable. Therefore, we resort to semi-supervised learning, where we only need to annotate a few images and leverage massive unlabeled data without ground truth labels to facilitate model learning. In this paper, we propose SEED, a SElf-Ensembling Depth completion framework to enhance the generalization of the model on unlabeled data. Specifically, SEED contains a pair of the teacher and student models, which are given high-density and low-density sparse depth maps as input respectively. The main idea underpinning SEED is to enforce the density-aware consistency by encouraging consistent prediction across different-density input depth maps. One empirical challenge is that the pseudo-depth labels produced by the teacher model inevitably contain wrong depth values, which would mislead the convergence of the student model. To resist the noisy labels, we propose an automatic method to measure the reliability of the generated pseudo-depth labels adaptively. By leveraging the discrepancy of prediction distributions, we model the pixel-wise uncertainty map as the prediction variance and rectify the training process from noisy labels explicitly. To our knowledge, we are among the early semi-supervised attempts on the depth completion task. Extensive experiments on both outdoor and indoor datasets demonstrate that SEED consistently improves the performance of the baseline model by a large margin and even is on par with several fully-supervised methods.