Shield Decentralization for Safe Multi-Agent Reinforcement Learning

Daniel Melcer; Christopher Amato; Stavros Tripakis

Shield Decentralization for Safe Multi-Agent Reinforcement Learning

Daniel Melcer, Christopher Amato, Stavros Tripakis

Published: 31 Oct 2022, Last Modified: 11 Jan 2023NeurIPS 2022 AcceptReaders: Everyone

Keywords: safety, shielding, reinforcement learning, synthesis, multi agent

TL;DR: We describe a method of shield decomposition to enforce safety constraints in a communication-free multi-agent reinforcement learning setting.

Abstract: Learning safe solutions is an important but challenging problem in multi-agent reinforcement learning (MARL). Shielded reinforcement learning is one approach for preventing agents from choosing unsafe actions. Current shielded reinforcement learning methods for MARL make strong assumptions about communication and full observability. In this work, we extend the formalization of the shielded reinforcement learning problem to a decentralized multi-agent setting. We then present an algorithm for decomposition of a centralized shield, allowing shields to be used in such decentralized, communication-free environments. Our results show that agents equipped with decentralized shields perform comparably to agents with centralized shields in several tasks, allowing shielding to be used in environments with decentralized training and execution for the first time.

Supplementary Material: zip

12 Replies

Loading