GLMDriveNet: Global-local Multimodal Fusion Driving Behavior Classification Network

Wenzhuo Liu, Yan Gong, Guoying Zhang, Jianli Lu, Yunlai Zhou, Junbin Liao

Published: 01 Jan 2024, Last Modified: 15 Nov 2024Eng. Appl. Artif. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Driving behavior classification plays an important role in many fields, such as Advanced Driving Assistance System (ADAS), traffic safety, and energy saving. In this paper, we propose a Global–local Multimodal Fusion Driving Behavior Classification Network (GLMDriveNet) which classifies driver behaviors into normal driving, aggressive driving, and drowsy driving. First of all, we design a Global–local Interaction Channel Attention Module (GLI-CAM) to extract effective features in both the roadside image and the spectrogram generated from the current prediction time and its previous four seconds of vehicle speeds. Furthermore, a learnable positional embedding is introduced to fuse the global and local information of the channels for better screening of the extracted features. Secondly, we propose a Multi-scale Feature Representation Fusion Module (MS-FRFM) to associate the high-scale and low-scale information of images and spectrograms and assign different importances for different modal information, making the network more inclined to useful modal information. Our model is evaluated on a public dataset UAH-DriveSet and achieves the best performance (98.4% F1-score on all roads, 97.4% F1-score on the motorway road, and 99.8% F1-score on the secondary road) compared to other state-of-the-art methods. Our model has a very fast speed (142 FPS) and strong generalization which has been verified through extensive experiments on multiple datasets. The code is available on https://github.com/liuwenzhuo1/GLMDrivenet.