OFCOURSE: A Multi-Agent Reinforcement Learning Environment for Order Fulfillment

Yiheng Zhu; Yang Zhan; Xuankun Huang; Yuwei Chen; yujie Chen; Jiangwen Wei; Wei Feng; Yinzhi Zhou; Haoyuan Hu; Jieping Ye

OFCOURSE: A Multi-Agent Reinforcement Learning Environment for Order Fulfillment

Yiheng Zhu, Yang Zhan, Xuankun Huang, Yuwei Chen, yujie Chen, Jiangwen Wei, Wei Feng, Yinzhi Zhou, Haoyuan Hu, Jieping Ye

Published: 26 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 Datasets and Benchmarks PosterEveryoneRevisionsBibTeX

Keywords: order fulfillment, multi-agent reinforcement learning, e-commerce

TL;DR: A simulated environment enables multi-agent reinforcement learning for order fulfillment.

Abstract: The dramatic growth of global e-commerce has led to a surge in demand for efficient and cost-effective order fulfillment which can increase customers' service levels and sellers' competitiveness. However, managing order fulfillment is challenging due to a series of interdependent online sequential decision-making problems. To clear this hurdle, rather than solving the problems separately as attempted in some recent researches, this paper proposes a method based on multi-agent reinforcement learning to integratively solve the series of interconnected problems, encompassing order handling, packing and pickup, storage, order consolidation, and last-mile delivery. In particular, we model the integrated problem as a Markov game, wherein a team of agents learns a joint policy via interacting with a simulated environment. Since no simulated environment supporting the complete order fulfillment problem exists, we devise Order Fulfillment COoperative mUlti-agent Reinforcement learning Scalable Environment (OFCOURSE) in the OpenAI Gym style, which allows reproduction and re-utilization to build customized applications. By constructing the fulfillment system in OFCOURSE, we optimize a joint policy that solves the integrated problem, facilitating sequential order-wise operations across all fulfillment units and minimizing the total cost of fulfilling all orders within the promised time. With OFCOURSE, we also demonstrate that the joint policy learned by multi-agent reinforcement learning outperforms the combination of locally optimal policies. The source code of OFCOURSE is available at: https://github.com/GitYiheng/ofcourse.

Supplementary Material: pdf

Submission Number: 261

Loading