Keywords: model-free reinforcement learning, model-based reinforcement learning, Baysian neural network, deep learning, reinforcement learning
Abstract: We propose a new mixture of model-based and model-free reinforcement learning
(RL) algorithms that combines the strengths of both RL methods. Our goal is to reduce the sample complexity of model-free approaches utilizing fictitious trajectory
rollouts performed on a learned dynamics model to improve the data efficiency of
policy gradient methods while maintaining the same asymptotic behaviour. We
suggest to use a special type of uncertainty quantification by a stochastic dynamics
model in which the next state prediction is randomly drawn from the distribution
predicted by the dynamics model. As a result, the negative effect of exploiting
erroneously optimistic regions in the dynamics model is addressed by next state
predictions based on an uncertainty aware ensemble of dynamics models. The
influence of the ensemble of dynamics models on the policy update is controlled
by adjusting the number of virtually performed rollouts in the next iteration according to the ratio of the real and virtual total reward. Our approach, which we
call Model-Based Policy Gradient Enrichment (MBPGE), is tested on a collection of benchmark tests including simulated robotic locomotion. We compare our
approach to plain model-free algorithms and a model-based one. Our evaluation
shows that MBPGE leads to higher learning rates in an early training stage and an
improved asymptotic behaviour.
Original Pdf: pdf
4 Replies
Loading