- Keywords: distributional reinforcement learning, perturbation, exploration
- Abstract: Distributional reinforcement learning aims to learn distribution of return under stochastic environments. Since the learned distribution of return contains rich information about the stochasticity of the environment, previous studies have relied on descriptive statistics, such as standard deviation, for optimism in face of uncertainty. These prior works are divided into risk-seeking or averse methods, which can be considered as having a one-sided tendency on risk. Unexpectedly, such approaches hinder convergence. In this paper, we propose a novel distributional reinforcement learning that explores by randomizing the risk criterion to reach a risk-neutral optimal policy. First, we provide a perturbed distributional Bellman optimality operator by distorting the risk measure in action selection. Second, we prove the convergence and optimality of the proposed method by using weaker contraction property. Our theoretical results support that the proposed method does not fall into biased exploration and converges to an optimal return distribution. Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including Atari games.