Policy Iteration with Gaussian Process based Value Function ApproximationDownload PDF

Jun 09, 2020 (edited Jul 01, 2020)RSS 2020 Workshop RobRetro SubmissionReaders: Everyone
  • Keywords: Reinforcement Leanring, Gaussian Processes
  • TL;DR: Experiments with Gaussian Processes as function approximators in TD learning algorithms
  • Abstract: In this work, we explore the use of Gaussian processes (GP) as function approximators for Reinforcement Learning (RL), and build estimates of the value function and Q-function using GPs. Such a representation allows us to learn Q-functions, and thereby policies, conditioned on uncertainty in the system dynamics, and can be useful in sample efficiently transferring policies learned in simulation to hardware. We use two approaches, GPTD and GPSARSA, to build approximate value functions and Q-functions respectively. While for simple, continuous problems, we found these to be effective at approximating the value function and the Q-function, for discontinuous landscapes GPSARSA deteriorates in performance, even on simple problems. As the problem complexity increases, for example, for an inverted pendulum, we find that both approaches are extremely sensitive to the GP hyperparameters, and do not scale well. We experiment with a sparse variant of the algorithm but find that GPSARSA still converges to poor solutions. Our experiments show that while GPTD and GPSARSA are nice theoretical formulations, they are not suitable for complex domains without extensive hyperparameter tuning.
2 Replies