Keywords: hand texture prior, pixel attention, monocular hand reconstruction, self-supervised texture learning
Domains: Vision and Learning
TL;DR: We show that UV-space texture alignment can act as a dense supervisory signal to improve monocular 3D hand reconstruction, boosting HaMeR with a lightweight plug-and-play texture module.
External Link: https://openaccess.thecvf.com/content/WACV2026/papers/Karvounas_Enhancing_Monocular_3D_Hand_Reconstruction_with_Learned_Texture_Priors_WACV_2026_paper.pdf
Abstract: We revisit the role of texture in monocular 3D hand reconstruction, not as an afterthought for photorealism, but as a dense, spatially grounded cue that can actively support pose and shape estimation. Our observation is simple: even in high-performing models, the overlay between predicted hand geometry and image appearance is often imperfect, suggesting that texture alignment may be an underused supervisory signal. We propose a lightweight texture module that embeds per-pixel observations into UV texture space and enables a novel dense alignment loss between predicted and observed hand appearances. Our approach assumes access to a differentiable rendering pipeline and a model that maps images to 3D hand meshes with known topology, allowing us to back-project a textured hand onto the image and perform pixel-based alignment. The module is self-contained and easily pluggable into existing reconstruction pipelines. To isolate and highlight the value of texture-guided supervision, we augment HaMeR, a high-performing yet unadorned transformer architecture for 3D hand pose estimation. The resulting system improves both accuracy and realism, demonstrating the value of appearance-guided alignment in hand reconstruction.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 97
Loading