MSD-CRFS: Multi-Scale Dual Aggregation Conditional Random Fields for Monocular Depth Estimation

Xidan Zhang, Jianing Wei, Atsunori Moteki, Yoshie Kobayashi, Genta Suzuki, Zhiming Tan

Published: 01 Jan 2024, Last Modified: 16 May 2025ICIP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We address the problem of estimating a high-quality dense depth map from a single RGB input image. We first analyze Conditional Random Field (CRF) in combination with transformers and exploit the multi-head attention mechanism to compute a potential function. Then, we propose spatial window CRFs and channel-wise CRFs to observe information in spatial and channel dimensions, and fuse them with a two-way fusion module, which is called Dual aggregation CRFs (DCRFs). Finally, the information from the multi-scale features observed by DCRFs is used for internal scene clustering by slot-attention to obtain the depth map. We call our method as MSD-CRFs. Experiments demonstrate that our method improves the performance across all metrics on the KITTI, and outperforms current SOTA results on the main ranking metrics $A b s \_$Rel on NYU Depth-v2. Further, we explore the model generalization capability via zero-shot test.