Gated Attention Bins for Depth Estimation

20 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: monocular depth estimation,gated attention, multiscale feature fusion
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Estimating pinpoint-accuracy depth from a single RGB image is challenging because it is an ill-posed problem as infinitely many 3D scenes can be projected to the same 2D scene. We propose a method to convert monocular depth estimation from a regression problem to a classification problem. The natural long-range property of the attention mechanism allows it to make good use of the global effective information in the fine features. We compute gated attention on highest-resolution feature map to obtain the association information between features to divide the depth range into bins whose center value is estimated adaptively per image. The final depth values are estimated by linearly combining with the results of the multiscale feature fusion and bin centers. We call our new building block GABins. Experiments demonstrate that our method improves the performance across all metrics on both the KITTI and NYUv2 datasets compared to previous methods. Further, we explore the model generalization capability via zero-shot testing.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2463
Loading