A Reinforcement Learning and Prediction-Based Lookahead Policy for Vehicle Repositioning in Online Ride-Hailing SystemsDownload PDFOpen Website

Published: 01 Jan 2024, Last Modified: 21 Feb 2024IEEE Trans. Intell. Transp. Syst. 2024Readers: Everyone
Abstract: Existing approaches for vehicle repositioning on large-scale ride-hailing platforms either ignore the spatial-temporal mismatch between supply and demand in real-time or overlook the long-term balance of the system. To account for both, we propose a lookahead repositioning policy in this paper, which is a novel approach to repositioning idle vehicles from both a dynamic system and a long-term performance perspective. Our method consists of two parts; the first part utilizes linear programming (LP) to formulate the nonstationary system as a time-varying, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$T$ </tex-math></inline-formula> -step lookahead optimization problem and explicitly models the fraction of drivers who follow repositioning recommendations (called the repositioning rate). The second step is to incorporate a reinforcement learning (RL) method to maximize long-term return based on learned value functions after the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$T$ </tex-math></inline-formula> time slots. Extensive studies utilizing a real-world dataset on both small-scale and large-scale simulators show that our method outperforms previous baseline methods and is robust to prediction errors.
0 Replies

Loading