Abstract: Direct policy search is a promising reinforcement learning framework in particular for controlling continuous, high-dimensional systems. As one of direct policy search, direct policy search reinforcement learning based on variational Bayesian inference (VBRL) was proposed. The VBRL algorithm estimates the policy parameter based on variational Bayesian inference and is therefore avoid overfitting problem. In this paper, we propose an extension of the VBRL model using techniques of kernel methods, which we call K-VBRL. The performance of the proposed K-VBRL is assessed in two experiments with mountain car task. These experiments highlight the K-VBRL produces higher average return and outperforms the conventional VBRL.
Loading