Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion

Taehyun Cho; Seungyub Han; Heesoo Lee; Kyungjae Lee; Jungwoo Lee

Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion

Taehyun Cho, Seungyub Han, Heesoo Lee, Kyungjae Lee, Jungwoo Lee

Published: 21 Sept 2023, Last Modified: 20 Dec 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: distributional reinforcement learning, risk

Abstract: Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty. However, using the estimated variance for optimistic exploration may cause biased data collection and hinder convergence or performance. In this paper, we present a novel distributional reinforcement learning that selects actions by randomizing risk criterion without losing the risk-neutral objective. We provide a perturbed distributional Bellman optimality operator by distorting the risk measure. Also,we prove the convergence and optimality of the proposed method with the weaker contraction property. Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return. Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including Atari 55 games.

Supplementary Material: zip

Submission Number: 9788

Loading