Time-Constrained Actor-Critic Reinforcement Learning for Concurrent Order Dispatch in On-Demand Delivery

Published: 01 Jan 2024, Last Modified: 10 Dec 2024IEEE Trans. Mob. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: On-demand delivery has experienced rapid growth in recent years, revolutionizing people's lifestyles with its timeliness and convenience. The order dispatch process in on-demand delivery is concurrent , wherein couriers continuously accept new orders and deliver them to customers within strict time constraints and dynamic demand and supply. Most of the existing order dispatch mechanisms are designed for independent dispatch or concurrent dispatch without strict deadlines, rendering them unsuitable for real-time concurrent dispatch in on-demand delivery. To address the challenge, we propose a T ime- C onstrained A ctor- C ritic Reinforcement learning based concurrent dispatch system called TCAC-Dispatch to reduce the overdue rate and enhance the long-term revenue. Specifically, we first design a deep matching network (DMN) with a variable action space, which integrates both states embedding (including route behaviors encoding) and actions’ embedding into a long-term value for dispatching decisions. Additionally, we design a time-constrained action pruning module to ensure compliance with time constraints. Then we utilize the Actor-Critic framework to tackle the concurrent dispatch considering strict time constraints and stochastic demand-supply. To further optimize the efficiency and delivery resource utilization, we propose an extension of TCAC (i.e., TCAC+), which consists of (i) a learning-based order service time prediction module to determine whether to relax the deadline of some orders; and (ii) a multi-critic framework to optimize concurrent order dispatch with both tight deadlines and relaxed deadlines using dynamic weighting mechanism. We evaluate the TCAC-Dispatch with one-month data involved with 36.48 million orders and 42,000 couriers collected from Eleme, one of the largest on-demand delivery companies in China. Experiments are conducted on a data-driven emulator deployed on the development environment of Eleme and the results demonstrate that our method outperforms state-of-the-art baselines with various metrics in both tight deadline and mixed deadline scenarios.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview