Going Deeper with General and Specific Inductive Bias for Real-Time Stereo Matching

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Stereo Matching, Inductive Bias, Deep Supervision
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Inductive Bias (IB) has sparked a revolutionary transformation by incorporating the advantages of CNNs and Transformers, including scale invariance and integration of locality and long-range dependencies, which is called general IB for its wide applicability. However, its efficacy is currently not enjoyed by stereo matching, one of the geometric vision tasks, because of the ignorance of volume-level scale invariance and the limitation of high real-time requirement. In contrast, a specific IB is adopted by constructing volume structure in stereo matching task, which helps to finally generate a confidence volume to predict disparity map (output), but fewer studies go into the specific volume structure. Based on the above issues, this paper develops a novel model named UStereo to introduce the general IB to stereo matching. Technically, we adopt inter-layer fusion to break down volume-level scale invariance to a recurrence strategy in initialization for information at low resolution and refinement process for the high, which further extends to capture long-range dependencies after shallow stacks of convolutions and normalization without time-consuming Transformers. Additionally, to reveal the role that the volume structure constructed by specific IB plays during inference, we propose the first-time in-depth study of volume at low resolution through varying degrees of restraint as well as 3 original statistic indicators to reflect the characteristics of representation within volumes. Experiments demonstrate UStereo has competitive performance with both fast speed and robust generalization, and ablation studies show the effectiveness of introducing general IB. Moreover, our analysis of the volumes at low resolution suggests they can be viewed as confidence volumes and a concentrated distribution of the disparity within volumes leads to enhanced performance, which could extend the role of the specific IB.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7301
Loading