A Centralized Reinforcement Learning-Based Method for Traffic Signal Optimization Using an Adaptive Sequential Decision

Published: 2025, Last Modified: 30 Jan 2026IEEE Trans. Intell. Transp. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deep reinforcement learning (DRL) algorithms have been proven to be effective in traffic signal timing scheme optimization and helpful for urban traffic congestion alleviation. Recently, the centralized control strategy has attracted great attention for coordinated signal control of multiple intersections overpassing the decentralized methods. However, this strategy has the problems of state space dimension disaster and credit assignment. Namely, the dimension of the global action space increases exponentially with an increase in the number of agents, and individual traffic signals cannot realize their contributions to the optimization of regional traffic. To solve the two aforementioned problems, this study proposes a centralized DRL algorithm for multiple signals using the sequential decision named the Sequential Light (SeqLight). First, a centralized DRL framework with a sequential-decision logic is constructed to decouple the joint actions of multiple agents, reducing the complexity of the action space to a polynomial level. Second, the multi-agent advantage value is decomposed into sequential advantage evaluations of every local agent, thus alleviating the credit assignment problem and improving the monotonic performance. Third, an adaptive optimization model for decision order is developed for traffic signal control, which coordinates the signal actions of local agents to learn the globally optimal policy. Finally, the proposed algorithm is verified by simulations using an actual road network. The simulation results show that compared to the other centralized RL method, the proposed method can reduce the average vehicle queue length by 12.64% and 14.20% under medium and high traffic demand conditions, respectively.
Loading