UniDex-ViTac: Learning Unified Visuo-Tactile Dexterous Manipulation Policy from Human Video Data

Published: 31 May 2026, Last Modified: 31 May 2026Beyond Teleop workshop, ICRA 2026 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Dexterous Manipulation, Visuo-Tactile Representation, Learning from Human Videos, Imitation Learning, Sim-to-Real
TL;DR: Learning a unified, multi-task dexterous manipulation policy from human videos by integrating global 3D point clouds with local binary tactile feedback to bridge the embodiment and contact gaps.
Abstract: Learning dexterous robotic manipulation directly from human videos is fundamentally challenged by the kinematic embodiment gap and the lack of contact information that is unobservable in videos. To address these limitations, we present UniDex-ViTac, a unified visuo-tactile imitation learning framework that distills physically feasible, contact-rich trajectories generated by residual RL specialists into a single multi-task generalist policy. Crucially, our generalist operates on an expressive visuo-tactile representation that explicitly fuses global 3D point clouds with local binary tactile feedback. By effectively reasoning over both spatial geometry and local contact events, UniDex-ViTac achieves a 68.3% success rate in simulation and demonstrates robust Sim2Real transfer on a physical 16-DoF hand, achieving a 66.4% average success rate across diverse seen and unseen objects. Project page: https://unidex-vitac.github.io/
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 17
Loading