## S2VG: Soft Stochastic Value Gradient method

Sep 25, 2019 Blind Submission readers: everyone Show Bibtex
• Abstract: Model-based reinforcement learning (MBRL) has shown its advantages in sample-efficiency over model-free reinforcement learning (MFRL). Despite the impressive results it achieves, it still faces a trade-off between the ease of data generation and model bias. In this paper, we propose a simple and elegant model-based reinforcement learning algorithm called soft stochastic value gradient method (S2VG). S2VG combines the merits of the maximum-entropy reinforcement learning and MBRL, and exploits both real and imaginary data. In particular, we embed the model in the policy training and learn $Q$ and $V$ functions from the real (or imaginary) data set. Such embedding enables us to compute an analytic policy gradient through the back-propagation rather than the likelihood-ratio estimation, which can reduce the variance of the gradient estimation. We name our algorithm Soft Stochastic Value Gradient method to indicate its connection with the well-known stochastic value gradient method in \citep{heess2015Learning}.
• Code: https://github.com/S2VG-anonymous1/S2VG
• Keywords: Model-based reinforcement learning, soft stochastic value gradient
• Original Pdf:  pdf
0 Replies