Abstract: Stereo matching is a pivotal technique for depth estimation and has been popularly applied in various computer vision tasks. Although many related methods have been reported recently, they still face some challenges such as significant disparity variations at object boundaries, difficult prediction at large disparity regions, and suboptimal generalization when label distribution varies between source and target domains. Therefore, we propose a stereo-matching model (i.e., EGLCR-Stereo) that utilizes edge structure information with adaptive fusion of multi-scale matching similarity information for disparity estimation. First, we use a lightweight network to predict the initial disparity. We apply large and small-scale similarity feature extraction modules to extract the matching similarity information within the wide-area receptive field and the refined matching similarity information under the local receptive field. Then, we develop a scale adaptive attention module for efficiently fusing information at different scales. Meanwhile, we propose an edge structure-aware module for exploring edge information in the scene. After that, we use an iterative-based strategy for disparity estimation using edge structure information with fused multi-scale matching similarity information. We conduct abundant experiments on some popular stereo matching datasets including Middlebury, KITTI, ETH3D, and Scene Flow. The results show that our proposed EGLCR-Stereo achieves state-of-the-art performance both in accuracy and generalization.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: The aim of this paper is to reconstruct the depth information of the scene using images between binocular views. Differentiated information exists between left and right views, and this paper predicts the disparity map of the reference viewpoint by fusing the edge structure information and the matching similarity information at different scales, and the disparity map can be reconstructed in three dimensions information of scene according to the camera parameters. As an important media information, 3D stereo information has an important application background in multimedia applications such as autonomous driving, games, movies and TV. More robust and accurate 3D stereo visual information can be obtained by the method proposed in this paper.
Supplementary Material: zip
Submission Number: 3626
Loading