BeBold: Exploration Beyond the Boundary of Explored RegionsDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: reinforcement learning, exploration
Abstract: Efficient exploration under sparse rewards remains a key challenge in deep reinforcement learning. To guide exploration, previous work makes extensive use of intrinsic reward (IR). There are many heuristics for IR, including visitation counts, curiosity, and state-difference. In this paper, we analyze the pros and cons of each method and propose the regulated difference of inverse visitation counts as a simple but effective criterion for IR. The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment. The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGridwith just 120M environment steps, without any curriculum learning. In comparison, previous SoTA only solves 50%of the tasks. BeBold also achieves SoTAon multiple tasks in NetHack, a popular rogue-like game that contains more challenging procedurally-generated environments.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2012.08621/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=k1UzUXJRcv
19 Replies

Loading