Model-Free Reinforcement Learning for Spatiotemporal Tasks Using Symbolic Automata

Published: 01 Jan 2023, Last Modified: 21 Aug 2025CDC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Reinforcement learning (RL) is a popular paradigm for synthesizing controllers in environments modeled as Markov Decision Processes (MDPs). The RL formulation assumes that users define local rewards that depend only on the current state (and action), and learning algorithms seek to find control policies that maximize cumulative rewards along system trajectories. An implicit assumption in RL is that policies that maximize cumulative rewards are desirable as they meet the intended control objectives. However, most control objectives are global properties of system trajectories, and meeting them with local rewards requires tedious, manual and error-prone process of hand-crafting the rewards. We propose a new algorithm for automatically inferring local rewards from high-level task objectives expressed in the form of symbolic automata (SA); a symbolic automaton is a finite state machine where edges are labeled with symbolic predicates over the MDP states. SA subsume many popular formalisms for expressing task objectives, such as discrete-time versions of Signal Temporal Logic (STL). We assume that a model-free RL setting, i.e., we assume no prior knowledge of the system dynamics. We give theoretical results that establish that an optimal policy learned using our shaped rewards also maximizes the probability of satisfying the given SA-based control objective. We empirically compare our approach with other RL methods that try to learn policies enforcing temporal logic and automata-based control objective. We demonstrate that our approach outperforms these methods both in terms of the number of iterations required for convergence and the probability that the learned policy satisfies the SA-based objectives.
Loading