OfflineLight: An Offline Reinforcement Learning Model for Traffic Signal Control

Qiang Wu; Mingyuan Li; Jun Shen; Bo Du; Hongling Zheng; Jiahao Wang

OfflineLight: An Offline Reinforcement Learning Model for Traffic Signal Control

Qiang Wu, Mingyuan Li, Jun Shen, Bo Du, Hongling Zheng, Jiahao Wang

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Reinforcement learning, Traffic signal control, Offline RL, Offline-AC, TSC-OID

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: We propose a general offline actor-critic framework and an adaptive decision-making model for traffic signal control.

Abstract: Reinforcement learning (RL) is gaining popularity in addressing the traffic signal control (TSC) problems. Yet, the trial and error training with environmental interactions for traditional RL-based methods is costly and time-consuming. Additionally, it is challenging to directly deploy a completely pre-trained RL model for all types of intersections. Inspired by recent advances in decision-making systems from offline RL, we propose a general offline actor-critic framework (Offline-AC) that considers policy and value constraints, and an adaptive decision-making model named OfflineLight based on Offline-AC. Offline-AC is further proved general and suitable for developing new offline RL algorithms. Moreover, we collect, organize and release the first offline interaction dataset for TSC (TSC-OID), which is generated from the state-of-the-art (SOTA) RL models that interact with a traffic simulation environment based on multiple datasets of real-world road intersections and traffic flow. Through numerical experiments on real-world datasets, we demonstrate that: (1) Offline RL can build a high-performance RL model without online interactions with the traffic environment; (2) OfflineLight matches or achieves SOTA among recent RL methods; and (3) OfflineLight shows comprehensive generalization performance after completing training on only 20% of the TSC-OID dataset. The relevant dataset and code are available at anonymous URL:https://anonymous.4open.science/r/OfflineLight-6665/README.md.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6980

Loading