ProMP: Proximal Meta-Policy Search

Jonas Rothfuss; Dennis Lee; Ignasi Clavera; Tamim Asfour; Pieter Abbeel

ProMP: Proximal Meta-Policy Search

Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, Pieter Abbeel

Published: 21 Dec 2018, Last Modified: 22 Jun 2025ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients. By controlling the statistical distance of both pre-adaptation and adapted policies during meta-policy search, the proposed algorithm endows efficient and stable meta-learning. Our approach leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.

Keywords: Meta-Reinforcement Learning, Meta-Learning, Reinforcement-Learning

TL;DR: A novel and theoretically grounded meta-reinforcement learning algorithm

Code: [![github](/images/github_icon.svg) jonasrothfuss/promp](https://github.com/jonasrothfuss/promp) + [![Papers with Code](/images/pwc_icon.svg) 5 community implementations](https://paperswithcode.com/paper/?openreview=SkxXCi0qFX)

Data: [MuJoCo](https://paperswithcode.com/dataset/mujoco)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 6 code implementations](https://www.catalyzex.com/paper/promp-proximal-meta-policy-search/code)

28 Replies

Loading