Benchmarking Stereo Geometry Estimation in the Wild

20 Sept 2025 (modified: 09 Oct 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3d vision, stereo, benchmark, depth
TL;DR: We benchmark stereo, monocular, and multi-view geometry estimators on 10 real and synthetic datasets
Abstract: We study the recent progress on stereo geometry estimation in the wild. Although recent stereo methods have achieved impressive benchmark results, the standard stereo benchmarks invest tremendous efforts into obtaining high-quality and perfectly calibrated stereo pairs, which is difficult to obtain in practice from in-the-wild settings. To address this in-the-wild evaluation gap, we introduce StereoBench, a benchmark for stereo, monocular, and multi-view geometry estimation methods comprising 26 method variants and 10 datasets within a unified depth-based evaluation protocol. Our findings reveal that although stereo methods perform well on high-quality benchmarks and out-of-domain synthetic data, they perform quite poorly on real-world data: even monocular methods with access to strictly less information often do better. We find that classic approaches for online stereo rectification cannot address this gap: instead, a far more effective strategy is to repurpose feed-forward multi-view geometry networks (such as VGGT) for calibration-robust stereo prediction, significantly outperforming dedicated stereo networks in the real world. We hope that our benchmark reveals crucial insights on robust stereo depth estimation in the wild that generalizes outside the domain of high-quality synthetic and benchmark inputs.
Primary Area: datasets and benchmarks
Submission Number: 22786
Loading