LCPPO: An Efficient Multi-agent Reinforcement Learning Algorithm on Complex Railway Network

Published: 12 Feb 2024, Last Modified: 06 Mar 2024ICAPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-agent Reinforcement Learning, Applications of RL, Complex Railway Network, Planning
Abstract: The complex railway network is a challenging real-world multi-agent system usually involving thousands of agents. Current planning methods heavily depend on expert knowledge to formulate solutions for specific cases and are therefore hardly generalized to new scenarios, on which Multi-agent Reinforcement Learning (MARL) draws significant attention. Despite some successful applications in multi-agent decision-making tasks, MARL is hard to be scaled to a large number of agents. This paper rethinks the curse of agents in the centralized-training-decentralized-execution paradigm and proposes a local-critic approach to address the issue. By combining the local critic with the PPO algorithm, we design a deep MARL algorithm denoted as Local Critic PPO (LCPPO). In experiments, we evaluate the effectiveness of LCPPO on a complex railway network benchmark, Flatland, with various numbers of agents. Noticeably, LCPPO shows prominent generalizability and robustness under the changes of environments.
Primary Keywords: Applications, Learning, Multi-Agent Planning
Category: Long
Student: Graduate
Supplemtary Material: zip
Submission Number: 252