EDS-Depth: Enhancing Self-Supervised Monocular Depth Estimation in Dynamic Scenes
Abstract: Self-supervised monocular depth estimation usually assumes that training samples contain only static objects, which leads to poor performance in real-world environments. The presence of dynamic objects incurs camera motion estimation errors, motion blur, and occlusions, which induce significant challenges for network training. To address these issues, we introduce EDS-Depth, a self-supervised learning framework, that improves monocular depth estimation in dynamic scenes. Firstly, we propose a novel TCE (Temporal Continuity Enhancement) strategy to reduce camera motion estimation errors and motion blur caused by dynamic objects. Video frames are interpolated to generate more continuous frames in order to smooth dynamic changes and enrich motion details. Secondly, we design a novel IPDM (Iterative Pseudo Depth Masking) module to address inaccurate object motion and occlusions in dynamic scenes. The module integrates multiple optical flows from different frames for triangulation, generating optimal depth as pseudo-supervision labels in dynamic regions. Extensive experiments on Cityscapes and KITTI datasets demonstrate the effectiveness of EDS-Depth, which surpasses state-of-the-art self-supervised monocular depth estimation methods, particularly in dynamic scenes.
Loading