Keywords: reinforcement learning, imitation learning
Abstract: Robotic skills can be learned via imitation learning (IL) using user-provided demonstrations, or via reinforcement learning (RL) using large amounts of autonomously collected experience. Both methods have complementary strengths and weaknesses: RL can reach a high level of performance, but requires exploration, which can be very time consuming and unsafe; IL does not require exploration, but only learns skills that are as good as the provided demonstrations. Can a single method combine the strengths of both approaches? A number of prior methods have aimed to address this question, proposing a variety of techniques that integrate elements of IL and RL. However, scaling up such methods to complex robotic skills that integrate diverse offline data and generalize meaningfully to real-world scenarios still presents a major challenge. In this paper, our aim is to test the scalability of prior IL + RL algorithms and devise a system based on detailed empirical experimentation that combines existing components in the most effective and scalable way. To that end, we present a series of experiments aimed at understanding the implications of each design decision, so as to develop a combined approach that can utilize demonstrations and heterogeneous prior data to attain the best performance on a range of real-world and realistic simulated robotic problems. Our complete method, which we call AW-Opt, combines elements of advantage-weighted regression and QT-Opt, providing a unified approach for integrating demonstrations and offline data for robotic manipulation.
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2111.05424/code)