Lightweight Self-Supervised Monocular Depth Estimation via Context-Aware Fusion and Separable Depthwise Convolution

Meina Zhao, Shixin Wang, Feng Xiao, Jianhua Zhang, Xu Cheng, Yunrui Zhu

Published: 01 Jan 2025, Last Modified: 04 Nov 2025CSCWD 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Monocular depth estimation is a critical problem in computer vision, with wide-ranging applications across various domains. However, existing methods often involve high computational costs, making them challenging to deploy efficiently on edge devices. To address the trade-off between computational complexity and inference accuracy, this paper presents an efficient and lightweight model for self-supervised monocular depth estimation. Our model incorporates a Context-Aware Fusion (CAF) module to capture both global and local feature dependencies. In addition, Separable Depthwise Convolution (SDC) module are utilized to reduce computational overhead, and the Multi-Scale Structural Similarity (MS-SSIM) loss function is employed to improve both depth estimation accuracy and visual perception quality. Experimental results show that the proposed model delivers improved accuracy while maintaining a lightweight and efficient architecture, ensuring its compatibility with edge device deployment. A comprehensive analysis of the findings is also presented, along with insights for future optimizations and improvements.

External IDs:dblp:conf/cscwd/ZhaoWXZCZ25