A Minimalist Ensemble Method for Generalizable Offline Deep Reinforcement Learning

Kun Wu; Yinuo Zhao; Zhiyuan Xu; Zhen Zhao; Pei Ren; Zhengping Che; Chi Harold Liu; Feifei Feng; Jian Tang

A Minimalist Ensemble Method for Generalizable Offline Deep Reinforcement Learning

Kun Wu, Yinuo Zhao, Zhiyuan Xu, Zhen Zhao, Pei Ren, Zhengping Che, Chi Harold Liu, Feifei Feng, Jian Tang

Published: 27 Apr 2022, Last Modified: 05 May 2023ICLR 2022 GPL PosterReaders: Everyone

Keywords: Offline Reinforcement Learning, Imitation Learning, Manipulation skills

TL;DR: In this paper, we propose a minimalist ensemble imitation learning-based method that trains a bundle of robust agents to achieve high success rates in No Interaction Track of the ManiSkill challenge.

Abstract: Deep Reinforcement Learning (DRL) has achieved awesome performance in a variety of applications. However, most existing DRL methods require massive active interactions with the environments, which is not practical in real-world scenarios. Moreover, most current evaluation environments are exactly the same as the training environments, leading to the negligence of the generalization ability of the agent. To fulfill the potential of DRL, an ideal policy should have 1) the ability to learn from a previously collected dataset (i.e., offline DRL) and 2) the generalization ability for the unseen scenarios and objects in the testing environments. Given the expert demonstrations collected from the training environments, the goal is to enhance the performance of the model in both the training and testing environments without any more interaction. In this paper, we proposed a minimalist ensemble imitation learning-based method that trains a bundle of agents with simple modifications on network architecture and hyperparameter tuning and combines them as an ensemble model. To verify our method, we took part in the No Interaction Track of the SAPIEN Manipulation Skill (ManiSkill) Challenge and conducted extensive experiments on the ManiSkill Benchmark. The challenge rank and experimental results well demonstrated the effectiveness of our method.

1 Reply

Loading