Improving Depth Completion via Depth Feature Upsampling

Published: 01 Jan 2024, Last Modified: 06 Mar 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The encoder-decoder network (ED-Net) is a commonly employed choice for existing depth completion methods, but its working mechanism is ambiguous. In this paper, we vi-sualize the internal feature maps to analyze how the net-work densifies the input sparse depth. We find that the en-coder feature of ED-Net focus on the areas with input depth points around. To obtain a dense feature and thus esti-mate complete depth, the decoder feature tends to comple-ment and enhance the encoder feature by skip-connection to make the fused encoder-decoder feature dense, resulting in the decoder feature also exhibits sparse. However, ED-Net obtains the sparse decoder feature from the dense fused feature at the previous stage, where the “dense-i-sparse‘’ process destroys the completeness of features and loses in-formation. To address this issue, we present a depth feature upsampling network (DFU) that explicitly utilizes these dense features to guide the upsampling of a low-resolution (LR) depth feature to a high-resolution (HR) one. The completeness of features is maintained throughout the up-sampling process, thus avoiding information loss. Fur-thermore, we propose a confidence-aware guidance module (CGM), which is confidence-aware and performs guidance with adaptive receptive fields (GARF), to fully exploit the potential of these dense features as guidance. Experimental results show that our DFU, a plug-and-play module, can significantly improve the performance of existing ED-Net based methods with limited computational overheads, and new SOTA results are achieved. Besides, the generalization capability on sparser depth is also enhanced. Project page: https://npucvr.github.iolDFU.
Loading