On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement LearningDownload PDF

May 21, 2021 (edited Oct 26, 2021)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: meta-learning theory, reinforcement learning theory, optimization
  • TL;DR: We resolve the bias issue in the update of original Model-Agnostic Meta-Learning (MAML) method for the reinforcement learning problem and provide convergence guarantees for our method.
  • Abstract: We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems, where the goal is to find a policy using data from several tasks represented by Markov Decision Processes (MDPs) that can be updated by one step of \textit{stochastic} policy gradient for the realized MDP. In particular, using stochastic gradients in MAML update steps is crucial for RL problems since computation of exact gradients requires access to a large number of possible trajectories. For this formulation, we propose a variant of the MAML method, named Stochastic Gradient Meta-Reinforcement Learning (SG-MRL), and study its convergence properties. We derive the iteration and sample complexity of SG-MRL to find an $\epsilon$-first-order stationary point, which, to the best of our knowledge, provides the first convergence guarantee for model-agnostic meta-reinforcement learning algorithms. We further show how our results extend to the case where more than one step of stochastic policy gradient method is used at test time. Finally, we empirically compare SG-MRL and MAML in several deep RL environments.
  • Supplementary Material: pdf
  • Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
  • Code: https://github.com/kristian-georgiev/SGMRL
11 Replies