Abstract: Deep reinforcement learning has achieved remarkable successes in solving various challenging artificial intelligence tasks. A variety of different algorithms have been introduced and improved towards human-level performance. Although technical advances have been developed for each individual algorithms, there has been strong evidence showing that further substantial improvements can be achieved by properly combining multiple approaches with difference biases and variances. In this work, we propose to use the James-Stein (JS) shrinkage estimator to combine on-policy policy gradient estimators which have low bias but high variance, with low-variance high-bias gradient estimates such as those constructed based on model-based methods or temporally smoothed averaging of historical gradients. Empirical results show that our simple shrinkage approach is very effective in practice and substantially improve the sample efficiency of the state-of-the-art on-policy methods on various continuous control tasks.
Keywords: bias-variance trade-off, James-stein estimator, reinforcement learning
8 Replies
Loading