A Generalized Stacked Reinforcement Learning Method for Sampled Systems

Published: 01 Jan 2023, Last Modified: 29 Oct 2024IEEE Trans. Autom. Control. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: A common setting of reinforcement learning (RL) is a Markov decision process (MDP) in which the environment is a stochastic discrete-time dynamical system. Whereas MDPs are suitable in such applications as video games or puzzles, physical systems are time continuous. A general variant of RL is of digital format, where updates of the value (or cost) and policy are performed at discrete moments in time. The agent–environment loop then amounts to a sampled system, whereby sample-and-hold is a specific case. In this article, we propose and benchmark two RL methods suitable for sampled systems. Specifically, we hybridize model predictive control with critics learning the optimal Q- and value (or cost-to-go) function. Optimality is analyzed and performance comparison is done in an experimental case study with a mobile robot.
Loading