Cross-Region Courier Displacement for On-Demand Delivery With Multi-Agent Reinforcement Learning

Shuai Wang, Shijie Hu, Baoshen Guo, Guang Wang

Published: 2023, Last Modified: 01 Feb 2024IEEE Trans. Big Data 2023Readers: Everyone

Abstract: On-demand delivery has become prevailing for people to order meals and groceries online, especially during the pandemic. It is essential to dispatch massive orders to limited couriers to satisfy on-demand delivery users, especially during peak hours. Existing studies mainly focus on order dispatching within a region, and they are challenging to be applied to the cross-region courier displacement problem due to (1) unique practical factors, including regional spatial-temporal demand-supply dynamics and strict delivery time constraints, and (2) the large-scale setting and high-dimensional decision space given massive couriers in on-demand delivery. To address these challenges, in this work, we propose an efficient cross-region courier displacement framework, i.e., <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">C ourier <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">D isplacement <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">R einforcement <underline xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">L earning (short for <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">CDRL ) based on centralized multi-agent actor-critic, which first design the actor-critic network with a time-varying displacement intensity control module to capture demand-supply dynamics and utilize the centralized training and decentralized execution multi-agent framework to address the large-scale coordination. One-month real-world order records collected from one of the biggest on-demand delivery services in the world are utilized to show the performance of our design. The extensive results show that our method offers a 47.97% of increase in balancing supply and demand and reduces idle ride time by 24.62% simultaneously.

0 Replies