BenchDepth: Are We on the Right Way to Evaluate Depth Foundation Models?

Zhenyu Li; Haotong Lin; Jiashi Feng; Peter Wonka; Bingyi Kang

BenchDepth: Are We on the Right Way to Evaluate Depth Foundation Models?

Zhenyu Li, Haotong Lin, Jiashi Feng, Peter Wonka, Bingyi Kang

12 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Depth Estimation, Benchmark

Abstract: Depth estimation is a fundamental task in computer vision with diverse applications. Recent advancements in deep learning have led to powerful depth foundation models (DFMs), yet their evaluation remains focused merely on geometry accuracy. Given the fact that downstream tasks increasingly rely on depth as guidance, we present BenchDepth, a new benchmark that evaluates DFMs through five carefully selected proxy tasks: depth completion, stereo matching, monocular feed-forward 3D scene reconstruction, SLAM, and vision-language spatial understanding. Our approach assesses DFMs based on their practical utility in real-world applications and provides complementary information to traditional benchmarks. We benchmark eight state-of-the-art DFMs and provide an in-depth analysis of key findings and observations. Interestingly, our results reveal discrepancies between rankings on traditional geometric benchmarks and those on downstream tasks, suggesting that existing evaluation protocols do not fully capture the practical effectiveness of DFMs. This underscores the importance of BenchDepth as a complementary benchmark, bridging the gap between geometry-centric metrics and application-driven evaluation.

Primary Area: datasets and benchmarks

Submission Number: 4489

Loading