IFVONet: Integrating Inter-Frame Variation and Occlusion Awareness for 3D Hand Mesh Reconstruction

Shiyao Wu, John Panneerselvam, Lu Liu, Rajeev Raman, Tianjin Huang

Published: 03 Dec 2024, Last Modified: 05 Mar 2025OpenReview Archive Direct UploadEveryoneCC BY-SA 4.0

Abstract: Reconstructing 3D hand meshes from video files is significantly challenging due to objects in the video often occluding the hand during manipulation. These occlusions can greatly reduce the quality of information extracted from the obscured regions and decrease temporal hand coherence over time. Existing approaches focus primarily on global occlusion regions but overlook temporal hand coherence, which limits their performance. Herein, we propose a novel framework called IFVONet, designed to improve 3D hand mesh re- construction by effectively capturing inter-frame variations and improving the recovery of global occlusions. IFVONet comprises three key components: (1) Pixel-Domain Variation Module for identifying inter-frame variations across adjacent frames, enhancing temporal hand coherence. (2) Enhanced Global Occlusion Recovery Module for integrating hand infor- mation into global occlusion representation, thereby improving the accuracy of occlusion feature recovery. (3) Hand Regression Module for dynamically aggregating hand information from inter-frame variations and globally recovered occlusion features into comprehensive hand representations, ultimately leading to enhanced 3D hand reconstruction. Extensive experiments on the HO3D-v2 and HO3D-v3 datasets demonstrate that our proposed IFVONet achieves state-of-the-art performance on both 3D hand mesh reconstruction and pose estimation.