Revisiting Continuous-Time Reinforcement Learning. A Study of HJB Solvers Based on PINNs and FEMs

Alena SHILOVA; Thomas Delliaux; Philippe Preux; Bruno Raffin

Revisiting Continuous-Time Reinforcement Learning. A Study of HJB Solvers Based on PINNs and FEMs

Alena SHILOVA, Thomas Delliaux, Philippe Preux, Bruno Raffin

Published: 20 Jul 2023, Last Modified: 01 Sept 2023EWRL16Readers: Everyone

Keywords: continuous time reinforcement learning, Hamiltonian Jacobi Bellman equation, viscosity solutions, Physics Informed Neural Networks

Abstract: Despite recent advances in Reinforcement Learning (RL), the Markov Decision Processes (MDPs) are not always the best choice to model complex dynamical systems requiring interactions at high frequency.Being able to work with arbitrary time intervals, Continuous Time Reinforcement Learning (CTRL) is more suitable for those problems. Instead of the Bellman equation operating in discrete time, it is the Hamiltonian Jacobi Bellman (HJB) equation that describes value function evolution in CTRL. Even though the value function is a solution of the HJB equation, it may not be its unique solution. To distinguish the value function from other solutions, it is important to look for the viscosity solutions of the HJB equation.The viscosity solutions constitute a special class of solutions that possess uniqueness and stability properties.In this paper, we bring together the formalism of viscosity solutions and practical methods for finding them. We also propose a novel way of training neural networks to obtain viscosity solutions. Finally, we do a comparison of those methods with discrete time RL (DTRL) algorithms to emphasize the benefits of considering the continuous time setting. This paper aims at providing the necessary theoretical basis for working with CTRL and setting a few possible directions for future research.

1 Reply

Loading