MetaMVUC: Sim-to-Real Active Domain Adaptation based on Multi-View Uncertainty and Metadata for Sample-Efficient Robotic Grasping

Published: 26 Jun 2024, Last Modified: 09 Jul 2024DGR@RSS2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Active Learning, Robot Grasping, Active Domain Adaptation, Multi-View Uncertainty, Metadata Diversity, Sample Efficient Learning
TL;DR: We present an active learning framework for sample-efficient sim-to-real domain adaptation of grasping robots based on our proposed MetaMVUC query strategy, which combines multi-view uncertainty and metadata diversity scoring.
Abstract: Good generalization of learning-based robotic grasping systems to unknown target data domains requires training on large-scale datasets. However, collecting such datasets is very costly and time-consuming. In addition, these systems often have limited zero-shot performance, especially when they are trained on synthetic data. To overcome these limitations of passive robot learning, we establish a novel active learning framework to enable fast and sample-efficient adaptation to a new real-world target data domain. Our proposed learning framework uses synthetic data as a starting point and then selects the most informative real-world target data samples for incremental domain adaptation. For this purpose, we propose a novel query strategy, MetaMVUC, which leverages multi-view uncertainty and metadata diversity. Our strategy uses multiple viewpoints of the scene to reason about model uncertainty by matching predictions across viewpoints and identifying samples with the highest uncertainty. Additionally, since robots in industry or logistics often operate in environments rich in metadata, MetaMVUC utlizes this metadata to sample diverse and well-distributed samples. Experimental results on the MGNv2 dataset and in our physical robot cell clearly demonstrate the effectiveness and the robustness of our proposed learning framework built upon MetaMVUC. Real grasp experiments show that with only 16 out of 324 annotated data samples, our system achieves successful grasp rates of more than 87% for seen objects and 80% for novel objects. When the annotation budget is increased to 40 samples, the robot is able to grasp successfully more than 90% of the time for both seen and novel objects.
Supplementary Material: zip
Submission Number: 4
Loading