Super-Resolution for Monocular Depth Estimation With Multi-Scale Sub-Pixel Convolutions and a Smoothness Constraint

Abstract: Depth estimation from a monocular image is of paramount importance in various vision tasks, such as obstacle detection, robot navigation, and 3D reconstruction. However, how to get an accurate depth map with clear details and a fine resolution remains an unresolved issue. As an attempt to solve this problem, we exploit image super-resolution concepts and techniques for monocular depth estimation and propose a novel CNN-based approach, namely <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$MSCN_{NS}$ </tex-math></inline-formula> , which involves multi-scale sub-pixel convolutions and a neighborhood smoothness constraint. Specifically, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$MSCN_{NS}$ </tex-math></inline-formula> makes use of sub-pixel convolutions with multi-scale fusions to retrieve a high-resolution depth map with fine details of the scene. Different from previous multi-scale fusion strategies, those multi-scale features come from supervised scale branches of the network. Furthermore, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$MSCN_{NS}$ </tex-math></inline-formula> incorporates a neighborhood smoothness regularization term to make sure that spatially closer pixels with similar features would have close depth values. The effectiveness and efficiency of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$MSCN_{NS}$ </tex-math></inline-formula> have been corroborated through extensive experiments conducted on benchmark datasets.
0 Replies
Loading