Abstract: Reinforcement learning in sparse-reward navigation environments with expensive and limited interactions is challenging and poses a need for effective exploration. Motivated by complex navigation tasks that require real-world training (when cheap simulators are not available), we consider an agent that faces an unknown distribution of environments and must decide on an exploration strategy. It may leverage a series of training environments to improve its policy before it is evaluated in a test environment drawn from the same environment distribution. Most existing approaches focus on fixed exploration strategies, while the few that view exploration as a meta-optimization problem tend to ignore the need for _cost-efficient_ exploration. We propose a cost-aware Bayesian optimization approach that efficiently searches over a class of dynamic subgoal-based exploration strategies. The algorithm adjusts a variety of levers --- the locations of the subgoals, the length of each episode, and the number of replications per trial --- in order to overcome the challenges of sparse rewards, expensive interactions, and noise. An experimental evaluation demonstrates that the new approach outperforms existing baselines across a number of problem domains. We also provide a theoretical foundation and prove that the method asymptotically identifies a near-optimal subgoal design.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: This is the camera-ready revision. We have:
- Added the random subgoals baseline requested by Reviewer 1jSM. Please see the "RND" line in Fig 7. As expected, random subgoals does not perform well.
- We've removed the comparison to MAML as requested by Reviewer 1jSM.
- We've also added the a link to the Github repo hosting the open source code.
We thank the all reviewers and the AC for a valuable review process.
Code: https://github.com/yjwang0618/subgoal-based-exploration
Assigned Action Editor: ~Adam_M_White1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 940
Loading