AugInsert: Learning Robust Visual-Force Policies via Data Augmentation for Object Assembly Tasks

Published: 01 May 2025, Last Modified: 14 May 2025ICRA 2025 Workshop: Beyond Pick and Place SpotlightPosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: robustness, contact-rich, assembly, multisensory, manipulation
TL;DR: We perform an extensive evaluation of a multisensory policy framework on a contact-rich peg-in-hole object manipulation task.
Abstract: Operating in unstructured environments like households requires robotic policies that are robust to out-of-distribution conditions. Although much work has been done in evaluating robustness for visuomotor policies, the robustness evaluation of a multisensory approach that includes force-torque sensing remains largely unexplored. This work introduces a novel, factor-based evaluation framework with the goal of assessing the robustness of multisensory policies in a peg-in-hole assembly task. To this end, we develop a multisensory policy framework utilizing the Perceiver IO architecture to learn the task. We investigate which factors pose the greatest generalization challenges in object assembly and explore a simple multisensory data augmentation technique to enhance out-of-distribution performance. We provide a simulation environment enabling controlled evaluation of these factors. Our results reveal that multisensory variations such as Grasp Pose present the most significant challenges for robustness, and naive unisensory data augmentation applied independently to each sensory modality proves insufficient to overcome them. Additionally, we find force-torque sensing to be the most informative modality for our contact-rich assembly task, with vision being the least informative. For additional experiments and qualitative results, we refer to the project webpage https://bit.ly/47skWXH.
Submission Number: 13
Loading