OVD-Explorer: A General Information-theoretic Exploration Approach for Reinforcement Learning

Jinyi Liu; Zhi Wang; YAN ZHENG; Jianye HAO; Junjie Ye; Chenjia Bai; Pengyi Li

OVD-Explorer: A General Information-theoretic Exploration Approach for Reinforcement Learning

Jinyi Liu, Zhi Wang, YAN ZHENG, Jianye HAO, Junjie Ye, Chenjia Bai, Pengyi Li

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Exploration, Uncertainty, Reinforcement Learning

Abstract: Many exploration strategies are built upon the optimism in the face of the uncertainty (OFU) principle for reinforcement learning. However, without considering the aleatoric uncertainty, existing methods may over-explore the state-action pairs with large randomness and hence are non-robust. In this paper, we explicitly capture the aleatoric uncertainty from a distributional perspective and propose an information-theoretic exploration method named Optimistic Value Distribution Explorer (OVD-Explorer). OVD-Explorer follows the OFU principle, but more importantly, it avoids exploring the areas with high aleatoric uncertainty through maximizing the mutual information between policy and the upper bounds of policy's returns. Furthermore, to make OVD-Explorer tractable for continuous RL, we derive a closed form solution, and integrate it with SAC, which, to our knowledge, for the first time alleviates the negative impact on exploration caused by aleatoric uncertainty for continuous RL. Empirical evaluations on the commonly used Mujoco benchmark and a novel GridChaos task demonstrate that OVD-Explorer can alleviate over-exploration and outperform state-of-the-art methods.

One-sentence Summary: We propose an information-theoretic exploration method OVD-Explorer following the OFU principle, and more importantly, it avoids exploring the areas with high aleatoric uncertainty.

Supplementary Material: zip

12 Replies

Loading