Trustworthy Driver State Perception via Contextual Interaction-Driven Evidential Vision-Language Fusion in Vehicular Cyber-Physical Systems
Abstract: A vision-driven driver monitoring system plays a vital role of vehicular cyber-physical systems (VCPS) to guarantee the driving safety. Recent advances focus on modeling a deep learning-based method to realize the driver monitoring system, which benefits from the powerful capability of data-driven feature extraction. Although the acceptable performances of driver state monitoring methods are achieved, there is still a gap between the emerged techniques and actual application scenarios. First, the human-centric visual appearances are not involved comprehensively to represent the driver states, resulting in ignoring the contextual interaction of behaviors. Second, the inherent uncertainty of driver situation is not considered, while the unreliable samples would lead to the untrustworthy results. In this paper, we focus on a vision-based driver state monitoring method, where a trustworthy driver state perception (TDSP) is proposed via human-centric contextual interaction-driven evidential vision-language fusion in VCPS. Specifically, a vision-language model-based architecture is first modified in temporal dimension to represent the visual human-centric contextual interactions, while a vision-language consistency loss is designed to mitigate the gap between visual and textual representations. Then, an evidence-based learning method is introduced to jointly conduct the classification and uncertainty estimation for driver states. Furthermore, to model the human-centric contextual interactions towards the evidence-based paradigm comprehensively, Dempster-Shafer theory-based combination rule is introduced to fuse the visual and textual representations. Extensive experiments are conducted on two public benchmarks, where the superiority of TDSP is demonstrated compared with the state-of-the-art methods. The superior performance of TDSP to recognize dangerous states are 85.41% and 83.12% in terms of accuracy and F1 score, which outperforms the state-of-the-art methods by 4.68% and 3.99%. Moreover, we validate the reliability of TDSP against the noisy data for VCPS. The code will be public at https://github.com/w64228013/TDSP.
External IDs:doi:10.1109/tits.2025.3542447
Loading