Abstract: Markov Decision Processes (MDPs) -- the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used in a way that wrongfully assumes that the state of an agent's environment does not change during action selection. As RL systems based on MDPs begin to find application in real-world safety critical situations this mismatch between the assumptions underling classical MDPs and the reality of real-time computation may lead to undesirable outcomes. In this paper we introduce a new framework, in which states and actions evolve simultaneously, we show how it is related to the classical MDP formulation. We analyze existing algorithms under the new real-time formulation and show why the might be suboptimal when used in real-time. We then use those insights to create an new algorithm Real-Time Actor Critic (RTAC) that outperforms the existing state-of-the-art continuous control algorithm Soft Actor Critic both in real-time and non-real-time settings.
Code Link: https://github.com/rmst/rtrl
CMT Num: 1748
0 Replies
Loading