Abstract: Monocular depth estimation is a critical component in understanding spatial relationships for various
computer vision applications, including autonomous driving and augmented reality. However, accurate depth
prediction remains challenging due to two primary factors: (1) the low pixel density of objects in distant regions
and (2) the loss of essential features during the resolution reduction process in traditional encoder architectures.
To address these challenges, this work introduces an innovative encoder-decoder architecture that incorporates
uncertainty maps to improve feature extraction, particularly in long-distance regions. The proposed model utilizes
auxiliary uncertainty networks to identify areas with high prediction difficulty, enabling the generation of more
robust feature representations through hierarchical feature combinations. Additionally, the decoder architecture
is designed to emphasize structural details by introducing an uncertainty edge weighting mask (UEWM) generation
module, which further enhances depth prediction performance in challenging regions. Experimental results
demonstrate that the proposed method significantly improves depth estimation accuracy in long-range scenarios, as
evaluated on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) and Dense Depth for
Autonomous Driving (DDAD) datasets. These findings highlight the potential of this uncertainty-aware monocular
depth estimation approach for practical applications, including autonomous driving and robotic perception.
Loading