Evolved Policy Gradients

Rein Houthooft; Richard Y. Chen; Phillip Isola; Bradly C. Stadie; Filip Wolski; Jonathan Ho; Pieter Abbeel

Evolved Policy Gradients

Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie, Filip Wolski, Jonathan Ho, Pieter Abbeel

12 Feb 2018 (modified: 05 May 2023)ICLR 2018 Workshop SubmissionReaders: Everyone

Abstract: We propose a meta-learning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss, parametrized via temporal convolutions over the agent's experience, enables fast task learning and eliminates the need for reward shaping at test time. Empirical results show that our evolved policy gradient algorithm achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method.

Keywords: meta-learning

TL;DR: We propose a meta-learning approach for learning gradient-based reinforcement learning (RL) algorithms. Empirical results show that our evolved policy gradient algorithm achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method.

4 Replies

Loading