Learned Shields for Multi-Agent Reinforcement Learning

Published: 01 Apr 2025, Last Modified: 22 Apr 2025ALAEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent, Shielding, Safety
TL;DR: Enforce safety in multi-agent domains, without a model of the environment, by learning a shield
Abstract: Shielding is an effective method for ensuring safety in multi-agent domains; however, its applicability has previously been limited to environments for which an approximate discrete model and safety specification are known in advance. We present a method for learning shields in cooperative fully-observable multi-agent environments where neither a model nor safety specification are provided, using architectural constraints to realize several important properties of a shield. We show through a series of experiments that our learned shielding method is effective at significantly reducing safety violations, while largely maintaining the ability of an underlying reinforcement learning agent to optimize for reward.
Type Of Paper: Work-in-progress paper (max page 6)
Anonymous Submission: Anonymized submission.
Submission Number: 42
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview