MAS$^3$AC: A Learning Framework for General Multi-Agent Safe and Stable Control with State-Wise Guarantees
Keywords: multi-agent systems, input-to-state stability, neural certificates and guarantees, issue of infeasibility, model-free reinforcement learning
TL;DR: We propose MAS$^3$AC, a framework solving the problem of state-wise safe and stable control for general MARL tasks under unknown dynamics. The method outperforms baselines with a strong balance between reward maximization and constraint satisfaction.
Abstract: Ensuring both safety and stability is essential when deploying reinforcement learning to control safety-critical systems, including multi-agent ones. However, many existing safe multi-agent reinforcement learning (MARL) studies focus only on cooperative tasks and adopt the constrained Markov decision process setting that only enforces expectation-based constraints, limiting their applicability to domains requiring strict state-wise guarantees, while stability remains largely underexplored. To address these challenges, we propose Multi-Agent Safe and Stable Soft Actor-Critic (MAS$^3$AC), a model-free framework that incorporates state-wise safety and stability constraints into MARL for general multi-agent tasks where each agent has its own objective. Our approach uses neural barrier functions to enforce safety, supported by a theoretical analysis of its convergence to a feasible local Nash equilibrium. It then uses the concept of input-to-state stability to guarantee stability for the multi-agent system, together with an analysis of the issue of infeasibility arising from conflicting state-wise safety and stability requirements. Empirically, we introduce a suite, spanning both cooperative tasks with global information and non-cooperative tasks with local observation, for benchmarking safe and stable MARL algorithms. Experimental results show that MAS$^3$AC consistently achieves a favorable balance between reward maximization and constraint satisfaction, delivering competitive or superior rewards while maintaining fewer safety violations compared to baselines on benchmarks.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 7865
Loading