Explainable Driver Activity Recognition Using Video Transformer in Highly Automated Vehicle

Akash Sonth, Abhijit Sarkar, Hirva Bhagat, A. Lynn Abbott

06 Aug 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: Distracted driving is one of the leading causes of road accidents. With the recent introduction of advanced driver assistance systems and L2 vehicles, the role of driver attention has gained renewed interest. It is imperative for vehicle manufacturers to develop robust systems that can identify distractions and aid in preventing such accidents in highly automated vehicles. This paper mainly focuses on studying secondary behaviors, and their relative complexity to develop a guide for auto manufacturers. In recent years, a few driver secondary action datasets and deep learning algorithms have been created to address this problem. Despite their success in many domains, Convolutional Neural Network based deep learning methods struggle to fully consider the overall context of an image, and focus on specific image features. We present the use of Video Transformers on two challenging datasets, one of them being a grayscale low-quality dataset. We also demonstrate how the novel concept of a Visual Dictionary can be used to understand the structural components of any secondary behavior. Finally, we validate different components of the visual dictionary by studying the attention modules of the transformer- based model and incorporating explainability in the computer vision model. An activity is decomposed into multiple small actions and attributes and the corresponding attention patches are highlighted in the input frame. Our code is available at github.com/VTTI/driver-secondary-action-recognition

0 Replies