eHMI-Action Scoring: Evaluate LLMs' eHMI Message-to-Action Translation Capability

ACL ARR 2025 February Submission1970 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The external human-machine interfaces (eHMIs) play a critical role as communication mediators between autonomous vehicles and other road users. However, current eHMI studies are typically evaluated in predefined scenarios that convey fixed messages through fixed action mappings, limiting their applicability in real-world environments where dynamic interactions are required. To address this limitation, we introduced Large Language Models (LLMs) into eHMI actions due to their impressive generativity and versatility across multiple tasks. This raises a key question: Can the LLM-driven eHMI system consistently translate intended messages into actions that other road users can accurately interpret? To answer this question, we created an eHMI-Action Scoring dataset consisting of eight interaction scenarios with intended messages, four eHMI modalities, ten actions generated by LLMs and human designers for each scenario-modality pair, rendered animations of these actions, and human scores evaluating the actions shown in the animations. Furthermore, we asked visual LLMs to evaluate these action clips, and the results demonstrate that their scores are consistent with those provided by humans, suggesting the feasibility of automated scoring. Finally, we benchmarked the capabilities of other state-of-the-art LLM models.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Resources and Evaluation, NLP Applications, Special Theme Track
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 1970
Loading