The Mirage of Action-Dependent Baselines in Reinforcement LearningDownload PDFOpen Website

2018 (modified: 11 Nov 2022)ICML 2018Readers: Everyone
Abstract: Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers...
0 Replies

Loading