Towards robust motion control in multi-source uncertain scenarios by robust policy iteration

Jie Li, Letian Tao, Wenjun Zou, Yuhang Zhang, Bin Shuai, Jingliang Duan, Shengbo Eben Li, Hao Sun, Yiru Wang, Yu Gao, Yuwen Heng, Anqing Jiang

Published: 01 Dec 2025, Last Modified: 17 Nov 2025Communications in Transportation ResearchEveryoneRevisionsCC BY-SA 4.0
Abstract: The adoption of neural networks for motion control modules emerges as a critical direction in the advancement of end-to-end autonomous driving. However, few studies have comprehensively addressed the challenges of robustness and generalization in motion control policies, including long-tailed distribution, distribution shift, and sim-to-real gap. In practical applications, motion control performance is compromised by diverse uncertainties, posing substantial challenges to real-world deployment. This work develops a training system to enhance the robustness and generalization of motion control policies when passing through multiple intersections. We first construct a task library comprising 6 driving scenarios, which are allocated to different sampling processes to rebalance the proportion of monotonous and edge scenarios. Next, we formulate a zero-sum game for uncertainties and driving actions with smoothing constraints within the range of observation noise. The driving policy is optimized by the proposed robust policy iteration method for the worst-case performance, which is approximated via Taylor expansion to avoid the computational burden caused by adversarial training on behavior disturbance, where the approximate results decouple model mismatches to ensure robust performance and action smoothness is boosted through penalty function method. Ultimately, the motion control performance and the robustness of driving policy are thoroughly validated by configuring the behavior patterns of traffic participants, ego dynamic parameters, and observation noise intensities in the simulation environment. Physical vehicle experiments on public urban roads further depict the robustness and generalization of the driving policy learned from simulations.
Loading