Abstract: Deep neural networks have recently thrived on single image depth estimation. That being said, current developments on this topic highlight an apparent compromise between accuracy and network size. This work proposes an accurate and lightweight framework for monocular depth estimation based on a self-attention mechanism stemming from the structure provided by spatial keypoints. Specifically, we utilize the keypoints to train a Salient Net for boosting the depth estimation performance. In addition, we introduce a normalized Hessian loss term invariant to scaling and shear along the depth direction, which is shown to substantially improve the accuracy. The proposed method achieves state-of-the-art results on KITTI and NYU-Depth-v2 while being at least three times more compact and extricating from the need to train with extra data. Experiments on the SUN-RGBD further demonstrate the generalizability of the proposed method.
Loading