Integration of Large Vision Models in Driver Monitoring Systems: Compressing and Distilling for Real-Time Automotive Applications

Georgios Markos Chatziloizos; Andrea Ancora; Andrew I. Comport; Barat Christian

Integration of Large Vision Models in Driver Monitoring Systems: Compressing and Distilling for Real-Time Automotive Applications

Georgios Markos Chatziloizos, Andrea Ancora, Andrew I. Comport, Barat Christian

Published: 09 Oct 2024, Last Modified: 19 Nov 2024Compression Workshop @ NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Knowledge distillation, Model compression, Edge devices, Autonomous Driving, Driver monitoring

TL;DR: This study optimizes real-time driver monitoring by distilling the Florence-2 model into a smaller, faster model, balancing detection accuracy and computational efficiency for automotive applications.

Abstract: This study focuses on optimizing neural network architectures for real-time detection of driver facial bounding boxes. Initially, we trained the Florence-2 model, which demonstrated high accuracy but proved too large for real-time applications. To address this, we employed model distillation, using Florence-2 as a teacher to train a more compact DINOv2 model. Our aim was to maintain high detection accuracy while minimizing memory usage and inference time, making the solution viable for real-time implementation on GPU and NPU devices. We present a comparative analysis of model performance in terms of IoU scores, memory consumption and inference times.

Submission Number: 55

Loading