Constrained Multi-Agent Reinforcement Learning with MAF-Net for Safe Trajectory Planning

Published: 19 Dec 2025, Last Modified: 05 Jan 2026AAMAS 2026 FullEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-agent system, planning under uncertainty, deep reinforcement learning, decentralized decision making
TL;DR: We propose a multi-head action filter network integrated with decentralized reinforcement learning to enable safe and scalable multi-agent planning under uncertainty.
Abstract: Multi-agent trajectory planning under uncertainty in safety-critical systems faces safety assurance and scalable coordination challenges. Traditional sampling and optimization-based methods lack real-time adaptability and do not scale well in multi-agent settings. Reinforcement learning methods show promise in addressing these issues, but often suffer from safety constraint violations and sample inefficiency. This paper proposes IDDPG-MAF, which integrates Independent Deep Deterministic Policy Gradient (IDDPG) with a pre-trained Multi-head Action Filter Network (MAF-Net). The task is first modeled as a constrained mixed integer nonlinear programming (MINLP) and then reformulated as a constrained decentralized Markov decision making (Dec-MDP) to enable real-time adaptability and coordination. IDDPG facilitates decentralized policy learning for scalable multi-agent coordination, while MAF-Net serves as a differentiable safety filter that removes unsafe actions and penalizes suboptimal behaviors. The IDDPG-MAF method is adapted to a complex multi-aircraft trajectory planning task, where multiple high-speed aircraft agents must coordinate in real time, adapt to dynamic thunderstorm movements, and maintain safe separation under uncertainties. Experimental results show that IDDPG-MAF significantly improves safety, scalability, and learning efficiency over baselines. It maintains over 99% safe separation, outperforming the state-of-the-art baseline at 82%, and achieves a 95.5% task success rate even under moderate position uncertainty. The method also scales safely to 45 aircraft within a compact spatiotemporal window, effectively doubling the maximum capacity of current operational practices.
Area: Search, Optimization, Planning, and Scheduling (SOPS)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1016
Loading