AnyDepth: Depth Estimation Made Easy

ICLR 2026 Conference Submission17111 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Zero-shot monocular depth estimation, Data-Centric Learning, Lightweight Decoder, Computer Vision
Abstract: Recent monocular depth estimation models have achieved impressive performance. However, they typically rely on traditional encoders, complex decoders, and large training sets, which collectively limit their efficiency and generalization. In this work, we pursue a complementary approach: building a lightweight and efficient training framework without sacrificing accuracy. First, we apply DINOv3 to zero-shot monocular depth estimation for the first time. Secondly, we design a lightweight decoder SDT to reduce the number of parameters and computational cost while maintaining performance. Third, inspired by data-centric learning, we first analyze the characteristics that a high-quality sample should possess and then propose a filtering strategy based on these characteristics to filter out low-quality samples, thereby reducing dataset size while improving model training quality. Experiments on multiple benchmarks demonstrate that, despite using fewer parameters and data, our method achieves comparable or even higher accuracy than similar methods at larger scale. Our work emphasizes the integration of visual backbone performance, decoder efficiency, and data quality to explore more efficient and simple zero-shot monocular depth estimation pipelines.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 17111
Loading