Policy Iteration with Gaussian Process based Value Function Approximation

Ashwin Khadke; Akshara Rai

Policy Iteration with Gaussian Process based Value Function Approximation

Ashwin Khadke, Akshara Rai

Published: 25 Jun 2020, Last Modified: 05 May 2023RobRetro 2020Readers: Everyone

Abstract: In this work, we explore the use of Gaussian processes (GP) as function approximators for Reinforcement Learning (RL), and build estimates of the value function and Q-function using GPs. Such a representation allows us to learn Q-functions, and thereby policies, conditioned on uncertainty in the system dynamics, and can be useful in sample efficiently transferring policies learned in simulation to hardware. We use two approaches, GPTD and GPSARSA, to build approximate value functions and Q-functions respectively. While for simple, continuous problems, we found these to be effective at approximating the value function and the Q-function, for discontinuous landscapes GPSARSA deteriorates in performance, even on simple problems. As the problem complexity increases, for example, for an inverted pendulum, we find that both approaches are extremely sensitive to the GP hyperparameters, and do not scale well. We experiment with a sparse variant of the algorithm but find that GPSARSA still converges to poor solutions. Our experiments show that while GPTD and GPSARSA are nice theoretical formulations, they are not suitable for complex domains without extensive hyperparameter tuning.

TL;DR: Experiments with Gaussian Processes as function approximators in TD learning algorithms

Keywords: Reinforcement Leanring, Gaussian Processes

2 Replies

Loading