BeBold: Exploration Beyond the Boundary of Explored Regions

Tianjun Zhang; Huazhe Xu; Xiaolong Wang; Yi Wu; Kurt Keutzer; Joseph E. Gonzalez; Yuandong Tian

BeBold: Exploration Beyond the Boundary of Explored Regions

Tianjun Zhang, Huazhe Xu, Xiaolong Wang, Yi Wu, Kurt Keutzer, Joseph E. Gonzalez, Yuandong Tian

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: reinforcement learning, exploration

Abstract: Efficient exploration under sparse rewards remains a key challenge in deep reinforcement learning. To guide exploration, previous work makes extensive use of intrinsic reward (IR). There are many heuristics for IR, including visitation counts, curiosity, and state-difference. In this paper, we analyze the pros and cons of each method and propose the regulated difference of inverse visitation counts as a simple but effective criterion for IR. The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment. The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGridwith just 120M environment steps, without any curriculum learning. In comparison, previous SoTA only solves 50%of the tasks. BeBold also achieves SoTAon multiple tasks in NetHack, a popular rogue-like game that contains more challenging procedurally-generated environments.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/bebold-exploration-beyond-the-boundary-of/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=k1UzUXJRcv

19 Replies

Loading