How Do We Team Up? Human-Machine Co-driving Style Assessment Through Visual Dynamic Analysis and Vision-Language Model

Zhuorui Zhang, Donglin Li, Yiteng Sun, Ching-Hung Lee, Shanshan Feng, Fan Li

Published: 01 Jan 2025, Last Modified: 24 Jul 2025HCI (18) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As autonomous driving technology advances, understanding human-machine co-driving styles becomes increasingly crucial. The integration of autonomous systems influences traditional human driving styles, resulting in diverse co-driving dynamics. This paper focuses on generating comprehensive driver behavior reports using Vision-Language Model (VLM) to assess human-machine co-driving styles. The assessment is challenged by complex driving scenarios, individual differences, and unclear style indicators. To address these challenges, we introduce the Adaptive Virtual Co-driving Assessing (AVCA) framework. This framework employs a virtual reality platform and combines deep learning models with VLMs. By simulating critical driving scenarios in virtual environments, the framework enables efficient and cost-effective collection of multimodal signals. Deep learning models process temporal and visual data to identify key factors impacting driving safety and efficiency, such as vehicle dynamics and visual attention distribution. To enhance interpretability, the framework integrates these feature abstractions with Co-driving Assessing Thoughts, allowing VLMs to generate driver behavior reports that are clear and actionable for human drivers. Testing results demonstrate the framework’s advanced capabilities in multimodal signal extraction and analysis across diverse participants. By integrating multimodal fusion with model collaboration, the AVCA framework provides personalized assessments and feedback, fostering safer and more efficient driving practices.