$\pi$-eLight: Learning Interpretable Programmatic Policies for Effective Traffic Signal Control

Published: 2026, Last Modified: 15 Jan 2026IEEE Trans. Mob. Comput. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The recent advancements in Deep Reinforcement Learning (DRL) have significantly improved the performance of adaptive Traffic Signal Control (TSC). However, DRL policies are typically represented by over-parameterized neural networks, which function as black-box models. Consequently, the learned policies often lack interpretability and are challenging to deploy on resource-constrained edge hardware. Moreover, the DRL methods frequently exhibit poor generalization, struggling to transfer the learned policies across different geographical regions. These limitations hinder the real-world applicability of learning-based approaches. To address these issues, we suggest the use of an inherently interpretable program for representing the control policy. We present Programmatic Interpretable reinforcement learning for effective traffic signal control ($\pi$-eLight), a new approach designed to autonomously discover non-differentiable programs. Specifically, we first define an effective program framework as the control policy, where certain components remain learnable. Next, we introduce a Domain Specific Language (DSL) for constructing interpretable programs and transformation rules for generating programs with hierarchical structures. Last, we utilize Monte Carlo Tree Search (MCTS) to find the optimal program in a discrete space. Extensive experiments demonstrate that $\pi$-eLight consistently outperforms DRL-based baselines while exhibiting superior generalization across intersections in different cities. Moreover, the learned programmatic policies can be directly deployed on edge devices with minimal computational resources, further enhancing real-world applicability.
Loading