Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control

Songyuan Zhang; Oswin So; Mitchell Black; Chuchu Fan

Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control

Songyuan Zhang, Oswin So, Mitchell Black, Chuchu Fan

Published: 22 Jan 2025, Last Modified: 13 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: control barrier functions, multi-agent systems, black-box systems, partial observability, reinforcement learning

TL;DR: We propose DGPPO for solving multi-agent safe optimal control problem with unknown discrete-time dynamics, partial observability, changing neighborhoods, and input constraints, without a known performant nominal policy.

Abstract: Control policies that can achieve high task performance and satisfy safety constraints are desirable for any system, including multi-agent systems (MAS). One promising technique for ensuring the safety of MAS is distributed control barrier functions (CBF). However, it is difficult to design distributed CBF-based policies for MAS that can tackle unknown discrete-time dynamics, partial observability, changing neighborhoods, and input constraints, especially when a distributed high-performance nominal policy that can achieve the task is unavailable. To tackle these challenges, we propose **DGPPO**, a new framework that *simultaneously* learns both a *discrete* graph CBF which handles neighborhood changes and input constraints, and a distributed high-performance safe policy for MAS with unknown discrete-time dynamics. We empirically validate our claims on a suite of multi-agent tasks spanning three different simulation engines. The results suggest that, compared with existing methods, our DGPPO framework obtains policies that achieve high task performance (matching baselines that ignore the safety constraints), and high safety rates (matching the most conservative baselines), with a *constant* set of hyperparameters across all environments.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12158

Loading