Abstract: Stereo-matching is one of the most important low-level visual perception tasks. Currently, two-stage 2D-3D networks are the main solutions. These methods involve creating a cost volume using low-resolution stereo feature maps, which separate the network into a feature net and a matching net. However, two-stage methods may accumulate errors, and the use of a low-resolution cost volume may result in the loss of some of the matching information. To overcome these problems, we propose the first one-stage deep stereo network, named StereoOne. It has an efficient module that builds a cost volume at image resolution in real-time. The feature extraction and matching are learned in a single 3D network. Based on the experiments, the new network outperforms 2D-3D network baselines and achieves competitive performance with the state-of-the-art.
Loading