Keywords: reinforcement learning, exploration, meta-learning
Abstract: A key challenge for reinforcement learning (RL) consists of learning in environments with sparse extrinsic rewards. In contrast to current RL methods, humans are able to learn new skills with little or no reward by using various forms of intrinsic motivation. We propose AMIGo, a novel agent incorporating -- as form of meta-learning -- a goal-generating teacher that proposes Adversarially Motivated Intrinsic Goals to train a goal-conditioned "student" policy in the absence of (or alongside) environment reward. Specifically, through a simple but effective "constructively adversarial" objective, the teacher learns to propose increasingly challenging -- yet achievable -- goals that allow the student to learn general skills for acting in a new environment, independent of the task to be solved. We show that our method generates a natural curriculum of self-proposed goals which ultimately allows the agent to solve challenging procedurally-generated tasks where other forms of intrinsic motivation and state-of-the-art RL methods fail.
One-sentence Summary: A "constructively adversarial" teacher-student setup can augment on-policy algorithms to better solve difficult exploration tasks in RL.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Code: [![github](/images/github_icon.svg) facebookresearch/adversarially-motivated-intrinsic-goals](https://github.com/facebookresearch/adversarially-motivated-intrinsic-goals) + [![Papers with Code](/images/pwc_icon.svg) 4 community implementations](https://paperswithcode.com/paper/?openreview=ETBc_MIMgoX)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 5 code implementations](https://www.catalyzex.com/paper/arxiv:2006.12122/code)