LDCM-MVIT: A Lightweight Depth Completion Model Based on MViT

Published: 01 Jan 2024, Last Modified: 08 Oct 2024ICIC (LNAI 2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In the field of computer vision, many perception methods rely on depth information captured by depth cameras. However, the integrity of depth maps is hindered by the reflection and refraction of light on transparent objects. Existing methods of completing depth map are usually impractical due to depth estimation error or unacceptably slow inference speeds. To address this challenge, we propose a lightweight depth completion model based on the Mobile Vision Transformer (LDCM-MViT), which uses a Mobile Guide Block (MGB). The MGB can efficiently fuse features from RGB and depth maps with limited parameters. Furthermore, we provide two types of fusion strategies to process RGB and depth features to get final depth map. Finally, we demonstrate the performance of LDCM-MViT compared with the DDC-SRGBD model and GuideFormer model on Matterport3D and KITTI datasets. Experimental results show that our model has a higher accuracy in comparison to the traditional methods with limited parameters, especially on edge devices.
Loading