Keywords: Autonomous Driving, Multi-task Learning, Advanced Driver-Assistance Systems (ADAS), Deep Learning
TL;DR: Liquid Dino is a novel multi-task neural network for autonomous driving , achieving superior classification accuracy in driver behavior and contextual recognition tasks.
Abstract: In the realm of advanced driver-assistance systems (ADAS) and autonomous driving, the accurate classification of driver emotions, behaviors and contextual environments is critical for enhancing vehicle safety and user experience. This study investigates the performance of various neural network architectures across four distinct classification tasks: Emotion Recognition, Driver Behavior Recognition, Scene-Centric Context Recognition and Vehicle-Based Context Recognition, all of which incorporate visual information captured through cameras. By utilizing camera-based data, we aim to evaluate how different neural architectures handle visual inputs in these diverse contexts, thereby exploring the robustness and generalization of each model to different real-world scenarios. We compare the performance of several state-of-the-art models and introduce a novel contribution that significantly improve classification accuracies in all areas. Our results demonstrate that the proposed Liquid Dino architecture achieves an overall average accuracy of 83.79\%, outperforming other models in recognizing driver emotions, behaviors and contextual scenarios. These enhancements underscore the potential of our proposed methods in contributing to the development of more reliable and responsive ADAS.In the realm of advanced driver-assistance systems (ADAS) and autonomous driving, the accurate classification of driver emotions, behaviors and contextual environments is critical for enhancing vehicle safety and user experience. This study investigates the performance of various neural network architectures across four distinct classification tasks: Emotion Recognition, Driver Behavior Recognition, Scene-Centric Context Recognition and Vehicle-Based Context Recognition, all of which incorporate visual information captured through cameras. By utilizing camera-based data, we aim to evaluate how different neural architectures handle visual inputs in these diverse contexts, thereby exploring the robustness and generalization of each model to different real-world scenarios. We compare the performance of several state-of-the-art models and introduce a novel contribution that significantly improve classification accuracies in all areas. Our results demonstrate that the proposed Liquid Dino architecture achieves an overall average accuracy of 83.79\%, outperforming other models in recognizing driver emotions, behaviors and contextual scenarios. These enhancements underscore the potential of our proposed methods in contributing to the development of more reliable and responsive ADAS.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11339
Loading