Safety-Advanced Autonomous Driving for Urgent Hazardous Situations using Q-Compared Soft Actor-Critic
Keywords: Autonomous Driving, Reinforcement Learning, Imitation Learning, Hybrid Learning
TL;DR: We propose a hybrid learning technique that trains optimal autonomous driving policies from immature demonstration data in urgent, hazardous situations, resulting in the world's first technology to safely control oversteer and avoid obstacles ahead.
Abstract: Autonomous vehicles must be capable of safe driving under all conditions to ensure passenger safety. This includes urgent hazardous situations (UHS), such as skidding on slippery roads or tire grip saturation during high-speed driving, which are not only difficult even for expert human drivers but also challenging to develop autonomous driving technologies that surpass human capabilities. Even though the recent advancements in machine learning including imitation learning (IL), reinforcement learning (RL), and hybrid learning (HL) have enabled the safe navigation of autonomous vehicles in various complex scenarios, they have fundamental limitations in UHS. Driving policies trained via IL degrade in novel situations where expert demonstration data is scarce or of poor quality, and RL struggles to develop optimal driving policies in UHS, which have broad state and action spaces and high transition variance. HL techniques combining IL and RL also fall short, as they require nearly optimal demonstration data, which is nearly impossible to obtain in UHS due to the difficulty for human drivers to react appropriately.
To address these limitations, we propose a novel HL technique, Q-Compared Soft Actor-Critic (QC-SAC), which effectively utilizes immature demonstration data to develop optimal driving policies and adapt quickly to novel situations in UHS. QC-SAC evaluates the quality of demonstration data based on action value Q to prioritize beneficial data and disregard detrimental ones. Furthermore, QC-SAC improves the performance of the Q-network by leveraging demonstration data and enhances learning by rapidly incorporating new successful experiences from ongoing interactions, enabling fast adaptation to new situations. We test QC-SAC for two extreme UHS scenarios: oversteer control with collision avoidance (OCCA) and time-trial race (TTR). In OCCA, QC-SAC achieves a success rate 2.36 times higher than existing techniques, and in TTR, it reduces lap time by more than 13.6\% while completing 300 test runs without a single failure. By proposing an innovative HL technique capable of training superior driving policies with immature demonstration data, we provide a solution for autonomous driving technologies that can handle UHS and introduce the world-first safe-advanced autonomous driving technology capable of controlling a vehicle oversteer safely and avoiding obstacles ahead.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10934
Loading