Don’t Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning

Homer Rich Walke; Jonathan Heewon Yang; Albert Yu; Aviral Kumar; Jędrzej Orbik; Avi Singh; Sergey Levine

Don’t Start From Scratch: Leveraging Prior Data to Automate Robotic Reinforcement Learning

Homer Rich Walke, Jonathan Heewon Yang, Albert Yu, Aviral Kumar, Jędrzej Orbik, Avi Singh, Sergey Levine

11 May 2022 (modified: 05 May 2023)L-DOD 2022Readers: Everyone

Keywords: reinforcement learning, reset-free, autonomous, robotics, generalization, sample-efficiency, offline data

TL;DR: Prior offline interaction data can enable autonomous, efficient, and generalizable robot learning.

Abstract: Reinforcement learning (RL) algorithms typically require a substantial amount of data, which may be time-consuming to collect with a robot, as well as the ability to freely return to an initial state to continue practicing a task, which requires laborious human intervention in the real world. Moreover, robotic policies learned with RL often fail when deployed beyond the carefully controlled setting in which they were learned. In this work, we demonstrate that these varied challenges of real-world robotic learning can all be tackled by effective utilization of diverse offline interaction datasets collected from previously seen tasks. While much prior work on robotic RL has focused on learning from scratch, and has attempted to solve each of the above problems in isolation, we devise a system that uses prior offline datasets to tackle all of these challenges together. Our system first uses techniques from offline reinforcement learning to extract useful skills and representations from prior offline data, which gives the agent a baseline ability to perceive and manipulate the world around it. Then, when faced with a new task, our system adapts these skills to quickly learn to both perform the new task and return the environment to an initial state, effectively learning to perform its own environment reset. We show that training on prior data gives rise to behaviors that generalize to far more varied conditions, than simply not using this data. We evaluate our method on a suite of challenging robotic manipulation tasks, involving high-dimensional visual observations and sparse binary reward functions, both in the real world and in simulation. Our empirical results demonstrate that incorporating prior data into robotic reinforcement learning enables autonomous learning, substantially improves sample-efficiency of learning, and results in policies that generalize better.

0 Replies

Loading