- Keywords: stereo matching, feature extraction network, convolution neural network, receptive field
- TL;DR: We introduced a shallow featrue extraction network with a large receptive field for stereo matching tasks, which uses a simple structure to get better performance.
- Abstract: Stereo matching is one of the important basic tasks in the computer vision field. In recent years, stereo matching algorithms based on deep learning have achieved excellent performance and become the mainstream research direction. Existing algorithms generally use deep convolutional neural networks (DCNNs) to extract more abstract semantic information, but we believe that the detailed information of the spatial structure is more important for stereo matching tasks. Based on this point of view, this paper proposes a shallow feature extraction network with a large receptive field. The network consists of three parts: a primary feature extraction module, an atrous spatial pyramid pooling (ASPP) module and a feature fusion module. The primary feature extraction network contains only three convolution layers. This network utilizes the basic feature extraction ability of the shallow network to extract and retain the detailed information of the spatial structure. In this paper, the dilated convolution and atrous spatial pyramid pooling (ASPP) module is introduced to increase the size of receptive field. In addition, a feature fusion module is designed, which integrates the feature maps with multiscale receptive fields and mutually complements the feature information of different scales. We replaced the feature extraction part of the existing stereo matching algorithms with our shallow feature extraction network, and achieved state-of-the-art performance on the KITTI 2015 dataset. Compared with the reference network, the number of parameters is reduced by 42%, and the matching accuracy is improved by 1.9%.