HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception

Jinrui Zhang; Huan Yang; Ju Ren; Deyu Zhang; Bangwen He; Youngki Lee; Ting Cao; Yuanchun Li; Yaoxue Zhang; Yunxin Liu

HiMoDepth: Efficient Training-Free High-Resolution On-Device Depth Perception

Jinrui Zhang, Huan Yang, Ju Ren, Deyu Zhang, Bangwen He, Youngki Lee, Ting Cao, Yuanchun Li, Yaoxue Zhang, Yunxin Liu

Published: 01 Jan 2024, Last Modified: 16 May 2025IEEE Trans. Mob. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: High-resolution depth estimation, with a minimum resolution of $1280\times 960$ , is essential for achieving more immersive experiences in on-device 3D vision applications. However, implementing high-resolution solutions on resource-limited mobile devices presents significant challenges, such as the need for additional expensive depth sensors, computation-intensive machine learning models requiring large-scale datasets, or the need for device motion while the target object remains stationary. In this study, we propose HiMoDepth, an efficient training-free high-resolution depth estimation system that utilizes widely-available on-device dual cameras. HiMoDepth consists of two modules: 1) homogenizing the on-device heterogeneous cameras by iteratively cropping the Field-of-Views to make the focal length of the cameras equal and filtering out the out-of-sync frames based on time stamps, and 2) designing a hierarchical mobile GPU-friendly stereo matching method that effectively reduces the latency of stereo matching with high-resolution depth maps by using efficient data layout, reducing the number of memory accesses, and searching the corresponding pixel over a coarse-to-fine hierarchy. We implement HiMoDepth on multiple commodity mobile devices and conduct comprehensive evaluations. Experimental results show that HiMoDepth significantly outperforms the baselines in both accuracy and running speed on mobile devices that support high-resolution depth maps.

Loading