Region Separable Stereo Matching

Published: 01 Jan 2023, Last Modified: 05 Mar 2025IEEE Trans. Multim. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Convolutional neural networks (CNNs) have shown attractive performance for stereo matching. However, spatially shared convolution weights of CNN-based methods usually face a dilemma that the convolution weights suitable for aggregating contextual information in smooth regions often blur local matching details of textured regions and vice versa. This paper tries to find a way out of the dilemma via a novel region separable stereo matching (RSSM) method, which is universally applicable to CNN stereo models based on 4D cost volumes and can greatly improve the accuracy and efficiency of existing models. The key idea of our method is to automatically group image pixels into regions according to the gradients, and then construct and process the respective cost volume of each region separately. To perform cost aggregation, we propose a two-stage network consisted of regional grouping aggregation (RGA) and regional fusion aggregation (RFA). In RGA, convolutions are grouped in channel-wise, and each group of convolutions learn dedicated weights for the corresponding region via regional supervision. Through RGA, each group of convolutions can extract the most representative features from the corresponding region. In RFA, we combine matching clues of all convolution groups from RGA to output the final prediction map. We further extend the idea of regional grouping to feature extraction and modify the skip connection in aggregation networks to better adapt our method to stereo matching models. Experimental results on five public datasets show that our method can significantly improve several state-of-the-art 3D CNN based stereo models.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview