Divide and Explore: Multi-Agent Separate Exploration with Shared Intrinsic Motivations

Xiao Jing; Zhenwei Zhu; Hongliang Li; Xin Pei; Yoshua Bengio; Tong Che; Hongyong Song

Divide and Explore: Multi-Agent Separate Exploration with Shared Intrinsic Motivations

Xiao Jing, Zhenwei Zhu, Hongliang Li, Xin Pei, Yoshua Bengio, Tong Che, Hongyong Song

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Deep Reinforcement Learning, Exploration, Intrinsic Motivation, Distributed Learning

Abstract: One of the greatest challenges of reinforcement learning is efficient exploration, especially when training signals are sparse or deceptive. The main difficulty of exploration lies in the size and complexity of the state space, which makes simple approaches such as exhaustive search infeasible. Our work is based on two important observations. On one hand, modern computing platforms are extremely scalable in terms of number of computing nodes and cores, which can complete asynchronous and well load-balanced computational tasks very fast. On the other hand, Divide-and-Conquer is a commonly used technique in computer science to solve similar problems (such as SAT) of doing efficient search in extremely large state space. In this paper, we apply the idea of divide-and-conquer in the context of intelligent exploration. The resulting exploration scheme can be combined with various specific intrinsic rewards designed for the given task. In our exploration scheme, the learning algorithm can automatically divide the state space into regions, and each agent is assigned to explore one of these regions. All the agents run asynchronously and they can be deployed onto modern distributed computing platforms. Our experiments show that the proposed method is highly efficient and is able to achieve state-of-the-art results in many RL tasks such as MiniGrid and Vizdoom.

One-sentence Summary: Divide and Explore trains multiple concurrent exploring agents, and successfully guides each agent exploring different regions of state space with shared intrinsic motivations while keeping exploring the boundary.

13 Replies

Loading