Keywords: Imitation Learning, Sim-to-real, Multitask Learning
TL;DR: We introduce Point Bridge, a framework that uses unified domain-agnostic point-based representations to unlock the potential of synthetic simulation datasets and enable zero-shot sim-to-real policy transfer.
Abstract: Robot foundation models are starting to realize some of the promise of developing
generalist robotic agents, but progress remains bottlenecked by the availability of
large-scale real-world robotic manipulation datasets. Simulation and synthetic data
generation are a promising alternative to address the need for data, but the utility
of synthetic data for training visuomotor policies still remains limited due to the
visual domain gap between the two domains. In this work, we introduce POINT
BRIDGE, a framework that uses unified domain-agnostic point-based representa-
tions to unlock the potential of synthetic simulation datasets and enable zero-shot
sim-to-real policy transfer without explicit visual or object-level alignment across
domains. POINT BRIDGE combines automated point-based representation ex-
traction via Vision-Language Models (VLMs), transformer-based policy learning,
and inference-time pipelines that balance accuracy and computational efficiency
to establish a system that can train capable real-world manipulation agents with
purely synthetic data. POINT BRIDGE can further benefit from co-training on small
sets of real-world demonstrations, training high-quality manipulation agents that
substantially outperform prior vision-based sim-and-real co-training approaches.
POINT BRIDGE yields improvements of up to 44% on zero-shot sim-to-real trans-
fer and up to 66% when co-trained with a small amount of real data. POINT
BRIDGE also facilitates multi-task learning. Videos of the robot are best viewed at:
https://pointbridge-anon.github.io/
Primary Area: applications to robotics, autonomy, planning
Submission Number: 7801
Loading