Offline policy reuse-guided anytime online collective multiagent planning and its application to mobility-on-demand systems

Wanyuan Wang, Qian Che, Yifeng Zhou, Weiwei Wu, Bo An, Yichuan Jiang

Published: 01 Jan 2024, Last Modified: 31 Aug 2024Auton. Agents Multi Agent Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The popularity of mobility-on-demand (MoD) systems boosts online collective multiagent planning (Online_CMP), where spatially distributed servicing agents are planned to meet dynamically arriving demands. For city-scale MoDs with a fleet of agents, Online_CMP methods must make a tradeoff between computation time (i.e., real-time) and solution quality (i.e., the number of demands served). Directly using an offline policy can guarantee real-time, but cannot be dynamically adjusted to real agent and demand distributions. Search-based online planning methods are adaptive, but are computationally expensive and cannot scale up. In this paper, we propose a principled Online_CMP method, which reuses and improves the offline policy in an anytime manner. We first model MoDs as a collective Markov Decision Process (\({\mathbb {C}}\)-MDP) where the collective behavior of agents affects the joint reward. Given the \({\mathbb {C}}\)-MDP model, we propose a novel state value function to evaluate the policy, and a gradient ascent (GA) technique to improve the policy. We further show that offline GA-based policy iteration (GA-PI) can converge to global optima of \({\mathbb {C}}\)-MDP under certain conditions. Finally, with real-time information, the offline policy is used as the default plan, GA-PI is used to improve it and generate an online plan. Experimental results show that our offline policy reuse-guided Online_CMP method significantly outperforms standard online multiagent planning methods on MoD systems like ride-sharing and security traffic patrolling in terms of computation time and solution quality.