Abstract: Learning depth from a single image is an important issue in computer vision. To solve this problem, encoder-decoder architect is usually employed as a powerful architecture to learn the dense corresponding function. In this work, we propose a symmetrical Spindle network of the encoder-decoder to learn the fine-grained depth. Unlike traditional convolution neural network, we first boost up the feature maps from low-dimension space to a high-dimension space, then extract the features for monocular depth learning. In order to overcome limitation of the computer memory, a single image super-resolution technique is proposed to replace the boosting process by fusing local cues in edge direction. Given the super-resolution images, the monocular depth learning needs more global information than most architectures for pixel-wise predictions. To address this issue, dilation kernel method is proposed to enlarge the receptive field in each layer. For the task of the super-resolution, the proposed method achieves better performance than the state-of-the-art methods. Extensive experiments on the monocular depth inference demonstrate that the Spindle network could achieve comparable performance on the NYU and Make3D datasets, compared with the state-of-the-art algorithms. The proposed method reveals a new perspective to learn the depth from a single image, which shows a promising generality to other pixel-wise prediction problems.
Loading