Multi-Agent Reinforcement Learning Guided by Signal Temporal Logic Specifications

Published: 2025, Last Modified: 25 Jan 2026IROS 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Reward design is a key component of deep reinforcement learning (DRL), yet some tasks and designer’s objectives may be unnatural to define as a scalar cost function. Among the various techniques, formal methods integrated with DRL have garnered considerable attention due to their expressiveness and flexibility in defining the reward and requirements for different states and actions of the agent. Nevertheless, the exploration of leveraging Signal Temporal Logic (STL) for guiding multi-agent reinforcement learning (MARL) reward design is still limited. The presence of complex interactions, heterogeneous goals, and critical safety requirements in multi-agent systems exacerbates this challenge. In this paper, we propose a novel STL-guided multi-agent reinforcement learning framework. The STL requirements are designed to include both task specifications according to the objective of each agent and safety specifications. The robustness values from checking the states against STL specifications are leveraged to generate rewards. We validate our approach by conducting experiments across various testbeds. The experimental results demonstrate significant performance improvements compared to MARL without STL guidance, along with a remarkable increase in the overall safety rate of the multi-agent systems.
Loading