Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Reward Estimation for Variance Reduction in Deep Reinforcement Learning
Joshua Romoff, Alexandre Piche, Peter Henderson, Vincent Francois-Lavet, Joelle Pineau
Feb 12, 2018 (modified: Jun 04, 2018)ICLR 2018 Workshop Submissionreaders: everyoneShow Bibtex
Abstract:In reinforcement learning (RL), stochastic environments can make learning a policy difficult due to high degrees of variance. As such, variance reduction methods have been investigated in other works, such as advantage estimation and control-variates estimation. Here, we propose to learn a separate reward estimator to train the value function, to help reduce variance caused by a noisy reward signal. This results in theoretical reductions in variance in the tabular case, as well as empirical improvements in both the function approximation and tabular settings in environments where rewards are stochastic. To do so, we use a modified version of Advantage Actor Critic (A2C) on variations of Atari games.
Keywords:Reinforcement Learning, Deep Learning
TL;DR:We propose to learn a separate reward estimator to train the value function, to help reduce variance caused by a noisy reward signal.
Enter your feedback below and we'll get back to you as soon as possible.