Keywords: Dexterous Manipulation, Reinforcement Learning, Learning from Demonstrations
Abstract: Hand–object motion-capture (MoCap) repositories provide abundant, contact-rich human demonstrations for scaling dexterous manipulation on robots. Yet demonstration inaccuracy and embodiment gaps between human and robot hands challenge direct policy learning. Existing pipelines adapt a three-stage workflow: retargeting, tracking, and residual correction. This multi-step process may not fully utilize demonstrations and can introduce compound errors. We introduce Reference-Scoped Exploration (RSE), a unified, single-loop optimization that integrates retargeting and tracking to train a scalable robot control policy directly from MoCap. Instead of treating demonstrations as strict ground truth, we view them as soft guidance. From raw demonstrations, we construct adaptive spatial scopes—time-varying termination boundaries, and reinforcement learning promotes the policy to stay within these envelopes while minimizing control effort. This holistic approach preserves demonstration intent, lets robot-specific strategies emerge, boosts robustness to noise, and scales effortlessly with large-scale demonstrations. We distill the scaled tracking policy into a vision-based, skill-conditioned generative control policy. This distilled policy captures diverse manipulation skills within a rich latent representation, enabling generalization across various objects and real-world robotic manipulation.
Spotlight: mp4
Submission Number: 787
Loading