Variational Bayesian Parameter-Based Policy Exploration

Published: 01 Jan 2020, Last Modified: 05 Jun 2025IJCNN 2020EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Reinforcement learning has shown success in many tasks that cannot provide explicit training samples and can only provide rewards. However, because of a lack of robustness and the need for hard hyperparameter tuning, reinforcement learning is not easily applicable in many new situations. One reason for this problem is that the existing methods do not account for the uncertainties of rewards and policy parameters. In this paper, for parameter-based policy exploration, we use a Bayesian method to define an objective function that explicitly accounts for reward uncertainty. In addition, we provide an algorithm that uses a Bayesian method to optimize this function under the uncertainty of policy parameters in continuous state and action spaces. The results of numerical experiments show that the proposed method is more robust than comparing method against estimation errors on finite samples, because our proposal balances reward acquisition and exploration.
Loading