Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning

Chenjia Bai; Lingxiao Wang; Peng Liu; Zhaoran Wang; Jianye HAO; Yingnan Zhao

Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning

Chenjia Bai, Lingxiao Wang, Peng Liu, Zhaoran Wang, Jianye HAO, Yingnan Zhao

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: optimistic exploration, backward bootstrapped bonus, posterior sampling, reinforcement learning

Abstract: Optimism in the face of uncertainty is a principled approach for provably efficient exploration for reinforcement learning in tabular and linear settings. However, such an approach is challenging in developing practical exploration algorithms for Deep Reinforcement Learning (DRL). To address this problem, we propose an Optimistic Exploration algorithm with Backward Bootstrapped Bonus (OEB3) for DRL by following these two principles. OEB3 is built on bootstrapped deep $Q$-learning, a non-parametric posterior sampling method for temporally-extended exploration. Based on such a temporally-extended exploration, we construct an UCB-bonus indicating the uncertainty of $Q$-functions. The UCB-bonus is further utilized to estimate an optimistic $Q$-value, which encourages the agent to explore the scarcely visited states and actions to reduce uncertainty. In the estimation of $Q$-function, we adopt an episodic backward update strategy to propagate the future uncertainty to the estimated $Q$-function consistently. Extensive evaluations show that OEB3 outperforms several state-of-the-art exploration approaches in Mnist maze and 49 Atari games.

Supplementary Material: zip

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=hSPIpC1MP0

24 Replies

Loading