Abstract: In this paper, we use multi-agent reinforcement learning to analyze how drivers’ behavioral policies in on-demand delivery services (ODS) affect the service’s overall revenue and other aspects. Specifically, we perform simulations using agents built with two reinforcement learning frameworks, Proximal Policy Optimization (PPO) and Random Network Distillation (RND), to analyze the effect of the existence of drivers who, although seemingly irrational, prefer to find new stores by exploring areas they have not visited before. In addition, this paper employs a design in which each driver has a separate learning model, which allows each driver to adopt different behavior patterns and search strategies, thus enabling a variety of searches throughout the system. Simulation experiments were conducted to evaluate the performance of the system in three different scenarios with different supply-demand balances. The results showed that the presence of curious drivers increased revenues, and this was especially true in situations where the number of orders relative to that of drivers was high. This suggests that when designing a behavioral policy for drivers in ODS, factors that promote exploratory behavior are important.
Loading