AW-Opt: Learning Robotic Skills with Imitation andReinforcement at Scale

Yao Lu; Karol Hausman; Yevgen Chebotar; Mengyuan Yan; Eric Jang; Alexander Herzog; Ted Xiao; Alex Irpan; Mohi Khansari; Dmitry Kalashnikov; Sergey Levine

AW-Opt: Learning Robotic Skills with Imitation andReinforcement at Scale

Yao Lu, Karol Hausman, Yevgen Chebotar, Mengyuan Yan, Eric Jang, Alexander Herzog, Ted Xiao, Alex Irpan, Mohi Khansari, Dmitry Kalashnikov, Sergey Levine

Published: 13 Sept 2021, Last Modified: 04 Aug 2025CoRL2021 PosterReaders: Everyone

Keywords: reinforcement learning, imitation learning

Abstract: Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or via reinforcement learning (RL) using large amounts of autonomously collected experience. Both methods have complementary strengths and weaknesses: RL can reach a high level of performance, but requires exploration, which can be very time consuming and unsafe; IL does not require exploration, but only learns skills that are as good as the provided demonstrations. Can a single method combine the strengths of both approaches? A number of prior methods have aimed to address this question, proposing a variety of techniques that integrate elements of IL and RL. However, scaling up such methods to complex robotic skills that integrate diverse offline data and generalize meaningfully to real-world scenarios still presents a major challenge. In this paper, our aim is to test the scalability of prior IL + RL algorithms and devise a system based on detailed empirical experimentation that combines existing components in the most effective and scalable way. To that end, we present a series of experiments aimed at understanding the implications of each design decision, so as to develop a combined approach that can utilize demonstrations and heterogeneous prior data to attain the best performance on a range of real-world and realistic simulated robotic problems. Our complete method, which we call AW-Opt, combines elements of advantage-weighted regression and QT-Opt, providing a unified approach for integrating demonstrations and offline data for robotic manipulation.

Supplementary Material: zip

Poster: png

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/aw-opt-learning-robotic-skills-with-imitation/code)

13 Replies

Loading