Evolved Policy Gradients

Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie, Filip Wolski, Jonathan Ho, Pieter Abbeel

Feb 12, 2018 ICLR 2018 Workshop Submission readers: everyone Show Bibtex
  • Abstract: We propose a meta-learning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss, parametrized via temporal convolutions over the agent's experience, enables fast task learning and eliminates the need for reward shaping at test time. Empirical results show that our evolved policy gradient algorithm achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method.
  • Keywords: meta-learning
  • TL;DR: We propose a meta-learning approach for learning gradient-based reinforcement learning (RL) algorithms. Empirical results show that our evolved policy gradient algorithm achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method.
0 Replies

Loading