Learning Personalized Driving Styles via Reinforcement Learning from Human Feedback

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: human driving preference, generative model, RLHF
TL;DR: TrajHF finetunes generative trajectory models with human feedback to enable personalized trajectory planning and is comparable to state-of-the-art results on NavSim benchmark.
Abstract: Generating human-like and adaptive trajectories is essential for autonomous driving in dynamic environments. While generative models have shown promise in synthesizing feasible trajectories, they often fail to capture the nuanced variability of personalized driving styles due to dataset biases and distributional shifts. To address this, we introduce TrajHF, a human feedback-driven finetuning framework for generative trajectory models, designed to align motion planning with diverse driving styles. TrajHF incorporates multi-conditional denoiser and reinforcement learning with human feedback to refine multi-modal trajectory generation beyond conventional imitation learning. This enables better alignment with human driving preferences while maintaining safety and feasibility constraints. TrajHF achieves performance comparable to the state-of-the-art on NavSim benchmark. TrajHF sets a new paradigm for personalized and adaptable trajectory generation in autonomous driving.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 15047
Loading