Training Agents to Satisfy Timed and Untimed Signal Temporal Logic Specifications with Reinforcement Learning

Nathaniel Hamilton, Preston Robinette, Taylor T. Johnson

Published: 01 Jan 2022, Last Modified: 13 Nov 2023SEFM 2022Readers: Everyone

Abstract: Reinforcement Learning (RL) depends critically on how reward functions are designed to capture intended behavior. However, traditional approaches are unable to represent temporal behavior, such as “do task 1 before doing task 2.” In the event they can represent temporal behavior, these reward functions are handcrafted by researchers and often require long hours of trial and error to shape the reward function just right to get the desired behavior. In these cases, the desired behavior is already known, the problem is generating a reward function to train the RL agent to satisfy that behavior. To address this issue, we present our approach for automatically converting timed and untimed specifications into a reward function, which has been implemented as the tool STLGym. In this work, we show how STLGym can be used to train RL agents to satisfy specifications better than traditional approaches and to refine learned behavior to better match the specification.

0 Replies