Exploring the Usage of Pre-trained Features for Stereo Matching

Published: 01 Jan 2024, Last Modified: 11 Nov 2024Int. J. Comput. Vis. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: For many vision tasks, utilizing pre-trained features results in improved performance and consistently benefits from the rapid advancement of pre-training technologies. However, in the field of stereo matching, the use of pre-trained features has not been extensively researched. In this paper, we present the first systematical exploration into the utilization of pre-trained features for stereo matching. To provide flexible employment for any combination of pre-trained backbones and stereo matching networks, we develop the deformable neck (DN) that decouples the network architectures of these two components. The core idea of DN is to utilize the deformable attention mechanism to iteratively fuse pre-trained features from shallow to deep layers. Empirically, our exploration reveals the crucial factors that influence using pre-trained features for stereo matching. We further investigate the role of instance-level information of pre-trained features, demonstrating it benefits stereo matching while can be suppressed during convolution-based feature fusion. Built on the attention mechanism, the proposed DN module effectively utilizes the instance-level information in pre-trained features. Besides, we provide an understanding of the efficiency-accuracy tradeoff, concluding that using pre-trained features can also be a good alternative with efficiency consideration.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview