Exploration by Uncertainty in Reward SpaceDownload PDF

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone
Abstract: Efficient exploration plays a key role in reinforcement learning tasks. Commonly used dithering strategies, such as-greedy, try to explore the action-state space randomly; this can lead to large demand for samples. In this paper, We propose an exploration method based on the uncertainty in reward space. There are two policies in this approach, the exploration policy is used for exploratory sampling in the environment, then the benchmark policy try to update by the data proven by the exploration policy. Benchmark policy is used to provide the uncertainty in reward space, e.g. td-error, which guides the exploration policy updating. We apply our method on two grid-world environments and four Atari games. Experiment results show that our method improves learning speed and have a better performance than baseline policies
Keywords: Policy Exploration, Uncertainty in Reward Space
TL;DR: Exploration by Uncertainty in Reward Space
4 Replies

Loading