Keywords: Autonomous Driving, Human-in-the-Loop Simulation, Multi-modal Planning and Evaluation
Abstract: Modeling the nuanced, multimodal nature of human driving remains a core challenge for autonomous systems, as existing methods often fail to capture the diversity of plausible behaviors in complex real-world scenarios. In this work, we introduce a novel benchmark and end-to-end planner for modeling realistic multimodality in autonomous driving decisions.
We propose a Gaussian Mixture Model (GMM)-based diffusion model designed to explicitly capture human-like, multimodal driving decisions in diverse contexts. Our model achieves state-of-the-art performance on current benchmarks, but reveals weaknesses in standard evaluation practices, which rely on single ground-truth trajectories or coarse closed-loop metrics while often penalizing diverse yet plausible alternatives. To address this limitation, we further develop a human-in-the-loop simulation benchmark that enables finer-grained evaluations and measures multimodal realism in challenging driving settings. Our code, models, and benchmark data will be released to promote more accurate and human-aware evaluation of autonomous driving models.
Spotlight: mp4
Submission Number: 55
Loading