ScaleDepth: Decomposing Metric Depth Estimation into Semantic-aware Scale Prediction and Adaptive Relative Depth Estimation
Abstract: Estimating the depth map of an image in the wild is a challenging visual task. Compared to relative depth estimation, metric depth estimation attracts more attention due to its practical physical significance and critical applications in real-life scenarios. However, existing depth estimation methods typically focus only on generalization of relative depth, neglecting the importance of metric depth generalization. To address this challenge, we propose a novel monocular depth estimation method called ScaleDepth. It decomposes metric depth into scene scale and relative depth and predicts them through a semantic-aware scale prediction (SASP) module and an adaptive relative depth estimation (ARDE) module, respectively. Our proposed approach has several merits. First, the SASP module can implicitly combine structural and semantic features of the images to predict precise scene scales. Second, the ARDE module can adaptively estimate the relative depth distribution of each image within a normalized depth space. Third, our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework, without the need to set the depth range or fine-tune the model. Extensive experiments demonstrate that our method achieves competitive performance in indoor, outdoor, unconfined, and unseen scenes. Project page: https://ruijiezhu94.github.io/ScaleDepth.
External IDs:doi:10.1109/tcsvt.2026.3667405
Loading