Abstract: Modeling expert driving behavior is crucial for the successful implementation of human-like autonomous driving. In this paper, we propose a new sampling-based Maximum Entropy Deep Inverse Reinforcement Learning (MEDIRL) framework. It leverages naturalistic human driving data to train the reward model and thus evaluates driving behaviors from the reward of sampled candidate trajectories. The proposed framework utilizes deep neural networks to learn the feature-reward mapping, which offers superior fitting capabilities compared to traditional linear reward functions. A polynomial trajectory sampler for long-term decision making and a dynamic window trajectory sampler for short-term planning are adopted to simplify the calculation of partition function in the MEDIRL algorithm. In addition, the proposed framework offers a solution to the probability estimation of driving behaviors by calculating the likelihood of sampled candidate trajectories based on their reward values. Comparative experiments are conducted on the NGSIM US-101 Highway dataset, and the experimental results demonstrate the superiority of the proposed model in personalizing reward functions, as well as the applicability of the proposed method in modeling driving behaviors across various time horizons.
External IDs:dblp:conf/iros/ShiZCZX25
Loading