Keywords: Autonomous Driving, Reinforcement Learning
Abstract: Autonomous driving planning requires synthesizing perceptual cues into safe and efficient trajectories, yet current end-to-end models trained solely by imitation learning often suffer from limited behavioral diversity and distributional mismatch.
To this end,we introduce READ, a reinforcement learning-based fine-tuning framework that significantly enhances pre-trained end-to-end driving models through structured policy refinement.
Our approach is grounded in the observation that although certain models already support diverse trajectory generation, their output action-space-probability-distributions are biased toward imitation rather than optimality.
READ efficiently recalibrates these distributions using lightweight RL updates, avoiding catastrophic forgetting while promoting high-reward behaviors.
Our approach also incorporates a novel reward decomposition strategy,designed to resolve the inefficiency of training with a composite reward signal.
Such signals obscure which behaviors lead to success, making it difficult for the policy to discern and reinforce high-reward patterns.
Our method decomposes the reward into semantically clear components, each providing a well-defined optimization objective, enabling the policy to independently learn and balance distinct objectives.
This leads to more efficient exploration, better credit assignment, and significantly improved convergence compared to using a single comprehensive reward.
Evaluated on the NavSim benchmark with DiffusionDrive as the baseline, READ significantly enhances driving performance with only minimal fine-tuning: it raises the PDMScore from 87.7 to 88.8 after only 2 epochs of training with a learning rate of $4.5 \times 10^{-5}$, compared to the original 100 training epochs at a rate of $6.4 \times 10^{-4}$.
Further open-loop evaluations of our method on nuScenes dataset show that READ reduces the collision rate of original DiffusionDrive-nusc branch baseline model by over 60\% (from 0.088\% to 0.031\%) while maintaining comparable L2 error (58.56 vs 58.32) after the same brief training of 1 epoch(about just 20 minutes), demonstrating its capacity to surpass expert demonstrations and learn safer driving policies.
READ provides an efficient and effective pathway for reinforcement learning-based optimization in safety-critical autonomous driving systems.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 4791
Loading