Generative policy-driven HAC reinforcement learning for autonomous driving incident response

Published: 01 Jan 2026, Last Modified: 05 Oct 2025Future Gener. Comput. Syst. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Reinforcement learning (RL) has become a pivotal approach in autonomous driving decision problems owing to its superior decision optimization capabilities. Existing discrete-time RL frameworks based on Markov decision process modeling face significant challenges in incident response control processes. These approaches lead to high collision rates during low-frequency decision-making and severe action oscillations during high-frequency decision-making. The fundamental limitation is that discrete-time RL methods cannot adapt to real driving scenarios where vehicle decisions rely on continuous-time dynamic system modeling. To address this, in this paper, we propose a generative policy-driven Hamilton-Jacobi-Bellman Actor-Critic (HAC) RL framework, which leverages the Actor to generate action policies and extends continuous-time Hamilton-Jacobi-Bellman capabilities to discrete-time Actor-Critic frameworks through Lipschitz constraints on vehicle control actions. Specifically, the HAC framework integrates deep deterministic policy gradient (DDPG) to implement the HJ-DDPG that incorporates two optimization approaches including delayed policy network updates and dynamic parameter space noise to enhance policy evaluation accuracy and exploration capability. Experimental results demonstrate that vehicles trained using the proposed method achieved 52 % lower average jerk and 48 % reduced steering rates compared to baseline method (Proximal Policy Optimization, PPO) under high-speed conditions, resulting in smoother and safer lane-changing maneuvers.
Loading