Data-Efficient Exploration with Self Play for AtariDownload PDF

Published: 22 Jul 2021, Last Modified: 05 May 2023URL 2021 PosterReaders: Everyone
Keywords: unsupervised reinforcement learning, exploration, self play
TL;DR: We introduce Self-Player a new exploration algorithm that samples hard but achievable goals from the agent's past; SelfPlayer outperforms GoExplore and Curiosity on the efficient Atari benchmark.
Abstract: Most reinforcement learning (RL) algorithms rely on hand-crafted extrinsic rewards to learn skills. However, crafting a reward function for each skill is not scalable and results in narrow agents that learn reward-specific skills. To alleviate the reliance on reward engineering it is important to develop RL algorithms capable of efficiently acquiring skills with no rewards extrinsic to the agent. While much progress has been made on reward-free exploration in RL, current methods struggle to explore efficiently. Self-play has long been a promising approach for acquiring skills but most successful applications have been in multi-agent zero-sum games with extrinsic reward. In this work, we present SelfPlayer, a data-efficient single-agent self-play exploration algorithm. SelfPlayer samples hard but achievable goals from the agent’s past by maximizing a symmetric KL divergence between the visitation distributions of two copies of the agent, Alice and Bob. We show that SelfPlayer outperforms prior leading self-supervised exploration algorithms such as GoExplore and Curiosity on the data-efficient Atari benchmark.
1 Reply

Loading