Real-Time Consistent Monocular Depth Recovery System for Dynamic Environments

Published: 2025, Last Modified: 15 Feb 2026IROS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Monocular depth estimation is essential for applications such as autonomous navigation and 3D reconstruction. However, achieving accurate and temporally consistent depth estimation in dynamic environments remains challenging due to scale ambiguity, sensitivity to dynamic objects, and inconsistent depth predictions. Traditional SLAM-based methods ensure global consistency but perform poorly in dynamic scenes, while deep learning-based approaches suffer from the absence of absolute scale and temporal stability. To address these issues, we propose a Real-Time Consistent Monocular Depth Recovery System that combines ORB-SLAM3 for sparse depth initialization, a ViT-based depth completion network, and a motion segmentation module to improve robustness in dynamic environments. Additionally, we introduce a dual-weight fusion module that adaptively balances RGB semantic features and geometric depth priors, ensuring high accuracy and consistency. Our system jointly optimizes both static and dynamic regions to produce globally scale-consistent dense depth maps with improved temporal stability. Extensive experiments on benchmark datasets demonstrate that our approach outperforms existing methods in terms of depth accuracy, temporal consistency, and robustness in dynamic scenes, while maintaining real-time performance.
Loading