Abstract: Robot learning has emerged as a promising tool for taming the complexity and diversity of the real world. Methods based on high-capacity models, such as deep networks, hold the promise of providing effective generalization to a wide range of open-world environments. However, these same methods
typically require large amounts of diverse training data to generalize effectively.
In contrast, most robotic learning experiments are small-scale, single-domain,
and single-robot. This leads to a frequent tension in robotic learning: how can
we learn generalizable robotic controllers without having to collect impractically
large amounts of data for each separate experiment? In this paper, we propose
RoboNet, an open database for sharing robotic experience, which provides an initial pool of 15 million video frames, from 7 different robot platforms, and study
how it can be used to learn generalizable models for vision-based robotic manipulation. We combine the dataset with two different learning algorithms: visual
foresight, which uses forward video prediction models, and supervised inverse
models. Our experiments test the learned algorithms’ ability to work across new
objects, new tasks, new scenes, new camera viewpoints, new grippers, or even entirely new robots. In our final experiment, we find that by pre-training on RoboNet
and fine-tuning on data from a held-out Franka or Kuka robot, we can exceed the
performance of a robot-specific training approach that uses 4x-20x more data.1
Loading