Keywords: Human Intention, Large Language Model, Action Prediction, End-to-End Autonomous Driving
Abstract: While end-to-end autonomous driving has achieved remarkable progress in geometric control, current systems remain constrained by a command-following paradigm that relies on simple navigational instructions.
Transitioning to genuinely intelligent agents requires the capability to interpret and fulfill high-level, abstract human intentions.
However, this advancement is hindered by the lack of dedicated benchmarks and semantic-aware evaluation metrics.
In this paper, we formally define the task of Intention-Driven End-to-End Autonomous Driving and present Intention-Drive, a comprehensive benchmark designed to bridge this gap.
We construct a large-scale dataset featuring complex natural language intentions paired with high-fidelity sensor data.
To overcome the limitations of conventional trajectory-based metrics, we introduce the Imagined Future Alignment (IFA), a novel evaluation protocol leveraging generative world models to assess the semantic fulfillment of human goals beyond mere geometric accuracy.
Furthermore, we explore the solution space by proposing two distinct paradigms: an end-to-end vision-language planner and a hierarchical agent-based framework.
The experiments reveal a critical dichotomy where existing models exhibit satisfactory driving stability but struggle significantly with intention fulfillment.
Notably, the proposed frameworks demonstrate superior alignment with human intentions.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: Large Language Model, Autonomous Driving
Languages Studied: English
Submission Number: 7688
Loading