Student First Author: yes
Keywords: Reinforcement learning, Data augmentation, Imaginary exploration, Optimistic initialization
TL;DR: Exploratory actions in learned model with optimistically-initialized critic improve value estimation and policy performance.
Abstract: In this paper, we propose a novel deep reinforcement learning approach for improving the sample efficiency of a model-free actor-critic method by using a learned model to encourage exploration. The basic idea consists in generating artificial transitions with noisy actions, which can be used to update the critic. To counteract the model bias, we introduce a high initialization for the critic and two filters for the artificial transitions. Finally, we evaluate our approach with the TD3 algorithm on different robotic tasks and demonstrate that it achieves a better performance with higher sample efficiency than several other model-based and model-free methods.
Supplementary Material: zip