Abstract: Current methods for 3D hand pose estimation fail to generalize well
to in-the-wild new scenarios due to varying camera viewpoints,
self-occlusions, and complex environments. To address this prob-
lem, we propose CycleHand to improve the generalization ability of
the model in a self-supervised manner. Our motivation is based on
an observation: if one globally rotates the whole hand and reversely
rotates it back, the estimated 3D poses of fingers should keep con-
sistent before and after the rotation because the wrist-relative hand
poses stay unchanged during global 3D rotation. Hence, we propose
arbitrary-rotation self-supervised consistency learning to improve
the model’s robustness for varying viewpoints. Another innova-
tion of CycleHand is that we propose a high-fidelity texture map
to render the photorealistic rotated hand with different lighting
conditions, backgrounds, and skin tones to further enhance the
effectiveness of our self-supervised task. To reduce the potential
negative effects brought by the domain shift of synthetic images,
we use the idea of contrastive learning to learn a synthetic-real
consistent feature extractor in extracting domain-irrelevant hand
representations. Experiments show that CycleHand can largely
improve the hand pose estimation performance in both canonical
datasets and real-world applications.
0 Replies
Loading