NexusAD: Exploring the Nexus for Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

Mengjingcheng Mo; Jingxin Wang; Like Wang; Haosheng Chen; Changjun Gu; Jiaxu Leng; Xinbo Gao

NexusAD: Exploring the Nexus for Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving

Mengjingcheng Mo, Jingxin Wang, Like Wang, Haosheng Chen, Changjun Gu, Jiaxu Leng, Xinbo Gao

Published: 07 Sept 2024, Last Modified: 15 Sept 2024ECCV 2024 W-CODA Workshop Abstract Paper TrackEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Autonomous Driving, Corner Cases, Large Language Models, Multimodal

TL;DR: This report presents our approach for the Corner Case Scene Understanding track of the Autonomous Driving Challenge at the ECCV 2024 Workshop.

Subject: Large Language Models techniques adaptable for self-driving systems

Confirmation: I have read and agree with the submission policies of ECCV 2024 and the W-CODA Workshop on behalf of myself and my co-authors.

Abstract: This report presents our approach for the Corner Case Scene Understanding track of the Autonomous Driving Challenge at the ECCV 2024 Workshop. The advent of multimodal large-scale language models (MLLMs) like GPT-4V has showcased remarkable multimodal perception and understanding capabilities, even in dynamic street scenes. However, applying MLLMs to address the corner cases in autonomous driving remains a largely unexplored area. Using the CODA-LM dataset, which features visual images paired with textual descriptions and analyses of corner cases, we adopted InternVL-2.0 as our base model and conducted domain-specific fine-tuning tailored to driving scenes. In this work, we enhance spatial correlation utilization within images by leveraging position and depth information to improve driving scene perception. Additionally, we incorporate chain-of-thought reasoning for greater accuracy and develop a context learning mechanism based on scene-aware retrieval, which further refines the model's understanding. This comprehensive strategy culminated in a final score of \textbf{68.97} on the leaderboard. Our code will be released at https://github.com/OpenVisualLab/NexusAD.

Supplementary Material: pdf

Submission Number: 5

Loading