Intrinsically Guided Exploration in Meta Reinforcement Learning

Jin Zhang; Jianhao Wang; Hao Hu; Tong Chen; Yingfeng Chen; Changjie Fan; Chongjie Zhang

Intrinsically Guided Exploration in Meta Reinforcement Learning

Jin Zhang, Jianhao Wang, Hao Hu, Tong Chen, Yingfeng Chen, Changjie Fan, Chongjie Zhang

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Meta reinforcement learning, Exploration, Information gain

Abstract: Deep reinforcement learning algorithms generally require large amounts of data to solve a single task. Meta reinforcement learning (meta-RL) agents learn to adapt to novel unseen tasks with high sample efficiency by extracting useful prior knowledge from previous tasks. Despite recent progress, efficient exploration in meta-training and adaptation remains a key challenge in sparse-reward meta-RL tasks. We propose a novel off-policy meta-RL algorithm to address this problem, which disentangles exploration and exploitation policies and learns intrinsically motivated exploration behaviors. We design novel intrinsic rewards derived from information gain to reduce task uncertainty and encourage the explorer to collect informative trajectories about the current task. Experimental evaluation shows that our algorithm achieves state-of-the-art performance on various sparse-reward MuJoCo locomotion tasks and more complex Meta-World tasks.

One-sentence Summary: We propose a novel meta-RL algorithm that incorporates information-theoretic intrinsic motivations, and achieves efficient exploration in both meta-training and adaptation.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=ybtMLYiES

11 Replies

Loading