Constrained Reinforcement-Learning-Enabled Policies With Augmented Lagrangian for Cooperative Intersection Management
Abstract: Traffic control at signal-free intersections is extensively studied to facilitate cooperative traffic for connected and autonomous vehicles (CAVs). Reinforcement learning (RL) techniques have proven effective for cooperative intersection management (CIM) challenges, but often involves unsafe states due to the arbitrary exploration of trial-and-error mechanism. To tackle the safety challenges associated with current RL-based CIM methods, this article proposes a safety-augmented CIM (SACIM) method. Initially, we introduce a constrained RL framework that integrates the augmented Lagrangian method with proximal policy optimization to address a constrained Markov decision process (CMDP). A policy network is designed to optimize performance, while multiple value networks are employed to evaluate policy performance and safety. By incorporating Lagrange multipliers and quadratic penalties, the method effectively transforms constraints optimization problems into unconstrained primal-dual problems, achieving an optimal solution without requiring strong convexity. Simultaneously, we incorporate communication delays and long- and short-term costs into the CMDP formulation to enhance safe and efficient policy exploration, closely mirroring real-world scenarios. Long-term cost reflects traffic safety related to collisions, while short-term cost accounts for the driving risks associated with safety violations during vehicle interactions. Furthermore, our method integrates a motion prediction-based, in-the-loop safety layer, facilitating rapid and robust policy learning. Through this safety enhancement design, SACIM effectively resolves the CIM issue within the CMDP framework, training a safe and reliable CIM method. Simulation results demonstrate that our method significantly improves traffic safety, efficiency, comfort, and inference time, outperforming various methods based on rules, optimal control, and RL.
Loading