Keywords: Reinforcement Learning, Intrinsic Reward, Dynamics Model, Ensemble
TL;DR: For sparse-reward reinforcement learning, the ensemble of multiple dynamics models is used to generate intrinsic reward designed as the minimum of the surprise.
Abstract: In this paper, a new intrinsic reward generation method for sparse-reward reinforcement learning is proposed based on an ensemble of dynamics models. In the proposed method, the mixture of multiple dynamics models is used to approximate the true unknown transition probability, and the intrinsic reward is designed as the minimum of the surprise seen from each dynamics model to the mixture of the dynamics models. In order to show the effectiveness of the proposed intrinsic reward generation method, a working algorithm is constructed by combining the proposed intrinsic reward generation method with the proximal policy optimization (PPO) algorithm. Numerical results show that for representative locomotion tasks, the proposed model-ensemble-based intrinsic reward generation method outperforms the previous methods based on a single dynamics model.
Original Pdf: pdf
9 Replies
Loading