DiffSQL: Leveraging Diffusion Model for Zero-Shot Self-Supervised Monocular Depth Estimation

Published: 2025, Last Modified: 07 Jan 2026IJCAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Self-supervised monocular depth estimation has attracted significant attention due to its broad applications in autonomous driving and robotics. Although significant performance improvements have been achieved by learning the relative distance of objects with the introduction of Self Query Layer (SQL), it struggles with zero-shot generalization due to the lack of geometric features and the fixed number of query sizes. To address these problems, we propose a diffusion-augmented self-supervised depth estimation framework, named DiffSQL, to learn geometric priors for feature augmentation. Additionally, we introduce a dynamic self-query layer that implicitly computes the relative distances between objects by adjusting the query size according to the feature distribution. Experimental results on the KITTI dataset show that DiffSQL outperforms SQLdepth by 1.03% in terms of AbsRel and 2.79% in terms of SqRel. Furthermore, our experiments demonstrate that DiffSQL is superior in zero-shot generalization.
Loading