Model Ensemble-Based Intrinsic Reward for Sparse Reward Reinforcement LearningDownload PDF

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone
Keywords: Reinforcement Learning, Intrinsic Reward, Dynamics Model, Ensemble
TL;DR: For sparse-reward reinforcement learning, the ensemble of multiple dynamics models is used to generate intrinsic reward designed as the minimum of the surprise.
Abstract: In this paper, a new intrinsic reward generation method for sparse-reward reinforcement learning is proposed based on an ensemble of dynamics models. In the proposed method, the mixture of multiple dynamics models is used to approximate the true unknown transition probability, and the intrinsic reward is designed as the minimum of the surprise seen from each dynamics model to the mixture of the dynamics models. In order to show the effectiveness of the proposed intrinsic reward generation method, a working algorithm is constructed by combining the proposed intrinsic reward generation method with the proximal policy optimization (PPO) algorithm. Numerical results show that for representative locomotion tasks, the proposed model-ensemble-based intrinsic reward generation method outperforms the previous methods based on a single dynamics model.
Original Pdf: pdf
9 Replies

Loading