Enhancing Human Trajectory Prediction with Reinforcement Learning from Quantified Human Preferences

Published: 2025, Last Modified: 25 Jan 2026PRCV (7) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We improve human trajectory prediction by introducing Reinforcement Learning from Human Feedback (RLHF) and Rejection Sampling techniques. To quantify human preferences, we parameterize and pre-train a diffusion backbone that models realistic human behaviors in the latent space. We then derive the diffusion score based on the latent trajectory feature, indicating the alignment between predicted trajectories and human decisions. By using the diffusion score as a reward, we refine the prediction model to generate trajectories preferred by humans. We further utilize rejection sampling to select the highest-scored trajectories to enhance the training. We validate our approach through various numerical experiments, human evaluations, and visualizations, showcasing a 15% reduction in positional deviation and a 20% increase in alignment with human preferences. Our proposed diffusion score can achieve a 67% Top-5 hit rate in retrieving the best candidate path with the least deviations from true human trajectories, thereby being capable of guiding realistic decision-making.
Loading