A Simple Baseline for Efficient Hand Mesh Reconstruction

Zhishan Zhou, Shihao Zhou, Zhi Lv, Minqiang Zou, Yao Tang, Jiajun Liang

Published: 01 Jan 2024, Last Modified: 10 Mar 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Hand mesh reconstruction has attracted considerable attention in recent years, with various approaches and techniques being proposed. Some of these methods in-corporate complex components and designs, which, while effective, may complicate the model and hinder efficiency. In this paper, we decompose the mesh decoder into token generator and mesh regressor. Through extensive ablation experiments, we found that the token generator should select discriminating and representative points, while the mesh regressor needs to upsample sparse keypoints into dense meshes in multiple stages. Given these function-alities, we can achieve high performance with minimal computational resources. Based on this observation, we propose a simple yet effective baseline that outperforms state-of-the-art methods by a large margin, while maintaining real-time efficiency. Our method outperforms existing solutions, achieving state-of-the-art (SOTA) results across multiple datasets. On the FreiHAND dataset, our approach produced a PA-MPJPE of 5.8mm and a PA-MPVPE of 6.1mm. Similarly, on the DexYCB dataset, we observed a PA-MPJPE of 5.5mm and a PA-MPVPE of 5.5mm. As for performance speed, our method reached up to 33 frames per second (fps) when using HRNet and up to 70 fps when employing FastViT-MA36. Code will be made available.