Detecting Improper Driving via Frozen CLIP

Published: 01 Jan 2023, Last Modified: 27 Oct 2024SIU 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Road accidents have irrecoverable outcomes despite being predictable and preventable. Since one of the main reasons is human factor, which consists of driver drowsiness and distraction, researchers propose methods to detect improper driving using computer vision models. With the help of Contrastive Language-Image Pre-training (CLIP), image representations are learnt with a high generalization capacity under the supervision of natural language. The method adopts frozen CLIP model to extract features which are used to decode the drowsiness and distraction cues. Our method processes the video stream obtained from camera and effectively detects driver drowsiness and distraction. Experiments on NTHU Driver Drowsiness Detection and DMD Driver Monitoring datasets indicate that utilizing CLIP model and transformer decoder outperforms methods that use 2-dimensional CNN concatenated with LSTM or 3-dimensional CNNs.
Loading