Keywords: 3D two-hand reconstruction
Abstract: Recently, estimating 3D hand pose and shape from monocular images has garnered significant attention from researchers, which finds numerous applications in animation, AR/VR, and embodied AI. Many tasks in the field of computer vision have demonstrated the substantial benefits of incorporating additional task-relevant reference information to enhance model performance. In this paper, we investigate whether the principle of ``the more you know, the better you understand'' also applies to the task of two-hand recovery. Unlike previous methods that rely solely on monocular image features for hand estimation, we extract 2D keypoints, segmentation map, and depth map features and then integrate them with image features. The hand regressor subsequently estimates hand parameters based on the fused features. The 2D keypoints and segmentation maps provide detailed finger XY-dimensional reference information for the hand, while the depth map offers pixel-level relative Y-dimensional reference information. Recovering the 3D hand from these intermediate representations should be more straightforward than doing so solely from the original RGB image. Current foundation models have already achieved impressive performance on these basic tasks, allowing us to obtain reliable results in most cases. However, when the two hands overlap significantly, resulting in complex entanglements. In such cases, hand penetration is likely to arise. The additional reference information (segmentation map and depth map) cannot assist with the occluded regions, and the predicted 2D keypoints for the occluded areas are also unreliable. To this end, we further employ a two-hand diffusion model as a prior and employ gradient guidance to refine the two-hand contact. Extensive experiments demonstrate that our approach achieves superior performance in 2D consistency alignment and depth recovery.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2779
Loading