READ: End-to-End Autonomous Driving Made Safer with Efficient Reinforcement Learning

13 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Autonomous Driving, Reinforcement Learning
Abstract: Autonomous driving planning requires synthesizing perceptual cues into safe and efficient trajectories, yet current end-to-end models trained solely by imitation learning often suffer from limited behavioral diversity and distributional mismatch. To this end,we introduce READ, a reinforcement learning-based fine-tuning framework that significantly enhances pre-trained end-to-end driving models through structured policy refinement. Our approach is grounded in the observation that although certain models already support diverse trajectory generation, their output action-space-probability-distributions are biased toward imitation rather than optimality. READ efficiently recalibrates these distributions using lightweight RL updates, avoiding catastrophic forgetting while promoting high-reward behaviors. Our approach also incorporates a novel reward decomposition strategy,designed to resolve the inefficiency of training with a composite reward signal. Such signals obscure which behaviors lead to success, making it difficult for the policy to discern and reinforce high-reward patterns. Our method decomposes the reward into semantically clear components, each providing a well-defined optimization objective, enabling the policy to independently learn and balance distinct objectives. This leads to more efficient exploration, better credit assignment, and significantly improved convergence compared to using a single comprehensive reward. Evaluated on the NavSim benchmark with DiffusionDrive as the baseline, READ significantly enhances driving performance with only minimal fine-tuning: it raises the PDMScore from 87.7 to 88.8 after only 2 epochs of training with a learning rate of $4.5 \times 10^{-5}$, compared to the original 100 training epochs at a rate of $6.4 \times 10^{-4}$. Further open-loop evaluations of our method on nuScenes dataset show that READ reduces the collision rate of original DiffusionDrive-nusc branch baseline model by over 60\% (from 0.088\% to 0.031\%) while maintaining comparable L2 error (58.56 vs 58.32) after the same brief training of 1 epoch(about just 20 minutes), demonstrating its capacity to surpass expert demonstrations and learn safer driving policies. READ provides an efficient and effective pathway for reinforcement learning-based optimization in safety-critical autonomous driving systems.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 4791
Loading