Efficient Adaptation for End-to-End Vision-Based Robotic Manipulation

Ryan Julian; Benjamin Swanson; Gaurav S. Sukhatme; Sergey Levine; Chelsea Finn; Karol Hausman

Efficient Adaptation for End-to-End Vision-Based Robotic Manipulation

Ryan Julian, Benjamin Swanson, Gaurav S. Sukhatme, Sergey Levine, Chelsea Finn, Karol Hausman

12 Jun 2020 (modified: 05 May 2023)LifelongML@ICML2020Readers: Everyone

Student First Author: Yes

Keywords: fine-tuning, continual learning, deep RL, vision, robotics, manipulation, lifelong learning

Abstract: One of the great promises of robot learning systems is that they will be able to learn from their mistakes and continuously adapt to ever-changing environments, but most robot learning systems today are deployed as fixed policies which do not adapt after deployment. Can we efficiently adapt previously learned behaviors to new environments, objects and percepts in the real world? We present empirical evidence towards a robot learning framework that facilitates continuous adaption. We demonstrate how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning, using less than 0.2% of the data necessary to learn the task from scratch. We find that the simple approach of fine-tuning pre-trained policies leads to substantial performance gains over the course of fine-tuning, and that pre-training via RL is essential: training from scratch or adapting from supervised ImageNet features are both unsuccessful with such small amounts of data. We also find that these positive results hold in a limited continual learning setting, in which we repeatedly fine-tune a single lineage of policies using data from a succession of new tasks. Our empirical conclusions are consistently supported by experiments on simulated manipulation tasks, and by 52 unique fine-tuning experiments on a real robotic grasping system pre-trained on 580,000 grasps.

TL;DR: We demonstrate how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning, using less than 0.2% of the data necessary to learn the task from scratch.

0 Replies

Loading