On Merits of Biased Gradient Estimates for Meta Reinforcement LearningDownload PDF

Anonymous

30 Sept 2021 (modified: 05 May 2023)NeurIPS 2021 Workshop MetaLearn Blind SubmissionReaders: Everyone
Keywords: meta reinforcement learning, reinforcement learning, meta learning, stochastic gradient descent
TL;DR: biased estimates significantly reduce variance compared to unbiased estimates for meta-RL objective
Abstract: Despite the empirical success of meta reinforcement learning (meta-RL), there are still a number poorly-understood discrepancies between theory and practice. Critically, biased gradient estimates are almost always implemented in practice, whereas prior theory on meta-RL only establishes convergence under unbiased gradient estimates. In this work, (1) We show that unbiased gradient estimates have variance $O(N)$ which linearly depends on the sample size $N$ of the inner loop updates; (2) We propose linearized score function (LSF) gradient estimates, which have bias $O(1/\sqrt{N})$ and variance $O(1/N)$; (3) We show that most empirical prior work in fact implements variants of the LSF estimates; (4) We establish convergence guarantees for the LSF estimates in meta-RL, showing better dependency on $N$ than prior work.
0 Replies

Loading