Keywords: Robot Learning: Imitation Learning, Robot Learning: Foundation Models, Grasping & Manipulation
TL;DR: DexWild introduces a scalable, human hand data collection system and co-training framework that enables dexterous robot policies to generalize in the open world.
Abstract: Large-scale, diverse robot datasets have emerged as
a promising path toward enabling dexterous manipulation policies
to generalize to novel environments, but acquiring such datasets
presents many challenges. While teleoperation provides high-
fidelity datasets, its high cost limits its scalability. Instead, what if
people could use their own hands, just as they do in everyday life,
to collect data? In DexWild, a diverse team of data collectors uses
their hands to collect hours of interactions across a multitude
of environments and objects. To record this data, we create
DexWild-System, a low-cost, mobile, and easy-to-use device. The
DexWild learning framework co-trains on both human and robot
demonstrations, leading to improved performance compared to
training on each dataset individually. This combination results in
robust robot policies capable of generalizing to novel environments,
tasks, and embodiments with minimal additional robot-specific
data. Experimental results demonstrate that DexWild significantly
improves performance, achieving a 68.5% success rate in unseen
environments—nearly four times higher than policies trained
with robot data only—and offering 5.8× better cross-embodiment
generalization.
Submission Number: 2
Loading