Keywords: Autonomous Driving, Safety Evaluation, Multi-agent Systems, Context Engineering
Abstract: Autonomous Driving (AD) faces persistent safety challenges from unforeseen long-tailed driving scenarios that require massive evaluation. Existing solutions, such as road test, scenario-based simulation and rule-based verification, remain insufficient: they either fail to uncover hazardous edge cases and inherit unsafe habits from human data, or lack adaptability across regions. Additionally, current approaches often provide limited contextual understanding, making it challenging to generate interpretable explanations of unsafe behavior. To address these gaps, we introduce **DriveEval**, a context-aware multi-agent framework for autonomous driving safety evaluation. It leverages the comprehensive knowledge and reasoning ability of large language models (LLMs) to understand traffic scenes and detect edge cases, while applying context engineering to ground LLMs in external knowledge, including traffic rules and historical accident data, for interpreting unsafe driving behaviors. The framework is organized as a multi-agent workflow comprising a Data Annotator, Scene Extractor, Rule Checker, Accident Retriever, and Driving Assessor, each handling specialized functions.
This multi-agent design improves precision through specialization, enables modular expansion with new knowledge sources, and allows the most suitable model to be chosen for each task, offering stronger performance than a single monolithic agent.
Experiments show that DriveEval can evaluate sensor data, such as dashcam video, to identify safety risks and recommend actionable improvements. Its assessments are closely aligned with human annotations, demonstrating that context-aware evaluation provides interpretable safety assurance.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 20221
Loading