Abstract: Monocular depth estimation is a rudimentary problem for robotic perception systems and downstream applications. However, depth estimation from a single image is an inherently ill-posed problem due to data loss related to projection from 3D to 2D. Recent studies address the discrepancy between camera parameters by using learning-based methods and unifying the camera model to canonical camera space or bipolar representations, thus addressing the problem of training a metric depth model over different datasets with different camera parameters. In addition, the previous study, OrchardDepth, introduced the sparse-dense depth consistency loss function to learn the dense depth distribution through the city autonomous driving scene to improve model performance in the orchard. Instead of enforcing strict consistency between the sparse and dense depth, this work introduced the KL divergence to encourage the network to adapt to the depth distributions of different sensors and penalize deviations from reliable regions while tolerating errors in unreliable areas. Furthermore, we further enhance the depth consistency loss by integrating bins into the supervised discretised depth distribution. This method significantly improves the robustness and performance of our previous method. In addition, it improves the absolute relative error in the orchard dataset by 17.3% and 16.2% in contrast to SILog Loss and OrchardDepth baseline, respectively. Thus enhancing the new training paradigm for depth estimation in the orchard scene.
External IDs:dblp:conf/iros/ZhengWGM25
Loading