Keywords: Imitation learning, Manipulation, Sensor integration
TL;DR: We integrate sensors in objects (here: clothes hanger) to obtain priviliged object state information, which we use at train time to obtain more effective policies, even though those policies don't rely on the privileged data at inference time.
Abstract: Large behaviour models have transformed the field of robotic manipulation, but prohibitive data requirements have thus far prevented a revolution similar to vision language models. We believe that instrumentation, i.e. sensor integration in objects, can provide invaluable state information and enable efficient, robust learning for robotic manipulation. In this paper, we study instrumented imitation learning for the task of clothes hanger insertion. Using 200 teleoperated demonstrations, we train and compare Diffusion Policies under multiple ways of leveraging instrumentation: as state input, via soft sensor estimation, as auxiliary prediction targets, and through vision backbone pretraining. Results show that incorporating instrumentation signals during training can improve success rates by up to 20\,\%pt over a vision-only baseline, without requiring sensors at deployment. These findings demonstrate that instrumentation can be effectively used as privileged information to guide policy learning, offering a practical route toward more sample-efficient imitation learning for complex robotic manipulation tasks. Datasets are available on Zenodo [link redacted], selected rollout videos on Google Drive.
Submission Number: 35
Loading