Keywords: Safe Multi-agent Reinforcement Learning, constrained policy optimisation, first-order optimisation
Abstract: In the realm of multi-agent reinforcement learning (MARL), achieving high performance is crucial for a successful multi-agent system.
Meanwhile, the ability to avoid unsafe actions is becoming an urgent and imperative problem to solve for real-life applications.
Whereas, it is still challenging to develop a safety-aware method for multi-agent systems in MARL. In this work, we introduce a novel approach called Multi-Agent First Order Constrained Optimization in Policy Space (MAFOCOPS), which effectively addresses the dual objectives of attaining satisfactory performance and enforcing safety constraints. Using data generated from the current policy, MAFOCOPS first finds the optimal update policy by solving a constrained optimization problem in the nonparameterized policy space. Then, the update policy is projected back into the parametric policy space to achieve a feasible policy. Notably, our method is first-order in nature, ensuring the ease of implementation, and exhibits an approximate upper bound on the worst-case constraint violation. Empirical results show that our approach achieves remarkable performance while satisfying safe constraints on several safe MARL benchmarks.
Supplementary Material: zip
Submission Number: 7005
Loading