Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: continuous time reinforcement learning, optimal control, Hamiltonian Jacobi Bellman equation, viscosity solutions, Physics Informed Neural Networks
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Despite recent advances in Reinforcement Learning (RL), the Markov Decision
Processes are not always the best choice to model complex dynamical systems
requiring interactions at high frequency. Being able to work with arbitrary time
intervals, Continuous Time Reinforcement Learning (CTRL) is more suitable for
those problems. Instead of the Bellman equation operating in discrete time, it
is the Hamiltonian Jacobi Bellman (HJB) equation that describes value function
evolution in CTRL. Even though the value function is a solution of the HJB
equation, it may not be its unique solution. To distinguish the value function
from other solutions, it is important to look for the viscosity solutions of the HJB
equation. The viscosity solutions constitute a special class of solutions that possess
uniqueness and stability properties. This paper proposes a novel approach to
approximate the value function by training a Physics Informed Neural Network
(PINN) through a specific $\epsilon$-scheduling iterative process constraining the PINN
to converge towards the viscosity solution and shows experimental results with
classical control tasks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7487
Loading