Abstract: Distracted driving is one of the leading causes
of road accidents. With the recent introduction of advanced
driver assistance systems and L2 vehicles, the role of driver
attention has gained renewed interest. It is imperative for
vehicle manufacturers to develop robust systems that can identify
distractions and aid in preventing such accidents in highly
automated vehicles. This paper mainly focuses on studying
secondary behaviors, and their relative complexity to develop
a guide for auto manufacturers. In recent years, a few driver
secondary action datasets and deep learning algorithms have
been created to address this problem. Despite their success
in many domains, Convolutional Neural Network based deep
learning methods struggle to fully consider the overall context
of an image, and focus on specific image features. We present
the use of Video Transformers on two challenging datasets,
one of them being a grayscale low-quality dataset. We also
demonstrate how the novel concept of a Visual Dictionary can be
used to understand the structural components of any secondary
behavior. Finally, we validate different components of the visual
dictionary by studying the attention modules of the transformer-
based model and incorporating explainability in the computer
vision model. An activity is decomposed into multiple small
actions and attributes and the corresponding attention patches
are highlighted in the input frame. Our code is available at
github.com/VTTI/driver-secondary-action-recognition
0 Replies
Loading