Fixing Defect of Photometric Loss for Self-Supervised Monocular Depth Estimation

陈姝湘潭大学, Zhengdong Pu, Xiang Fan, Beiji Zou

Published: 28 Sept 2022, Last Modified: 05 Mar 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: View-synthesis-based methods have shown very promising results for the task of unsupervised depth estimation in single images. Most existing approaches synthesize a new image and employ it as the supervision signal for depth and pose prediction. There are two problems in these approaches: 1) There are many combinations of pose and depth that can synthesize a certain new image; therefore, reconstructing the depth and pose based on the view-synthesis method from only two images is an inherently ill-posed problem; 2) The model is trained under the photometric consistency assumption that the brightness or gradient is constant when applied to the video sequences. However, this assumption is easily violated in realistic scenes due to light changes, reflective surfaces and occlusions. To overcome the first drawback, we exploit the point cloud consistency constraint to eliminate ambiguity. To overcome the second drawback, we use threshold masks to filter dynamic and occluded points and introduce matching point constraints that implicitly encode the geometry relationship between two matched points to improve the precision of depth prediction. In addition, we employ epipolar constraints to compensate for the instability of the photometric error in textureless regions and varying illumination conditions. The experimental results on the KITTI, Cityscapes and NYUv2 datasets show that the method can improve the accuracy of depth prediction and enhance the robustness of the model in handling textureless regions and illumination changes. The code and data are available at https://github.com/XTUPRLAB/FixUnDepth.

Fixing Defect of Photometric Loss for Self-Supervised Monocular Depth Estimation

陈姝 湘潭大学, Zhengdong Pu, Xiang Fan, Beiji Zou

陈姝湘潭大学, Zhengdong Pu, Xiang Fan, Beiji Zou