Keywords: Vision-Language Models (VLMs), Driving Uncertainty Prediction, Autonomous Driving Planning
TL;DR: We propose a novel framework utilizing VisionLLM for autonomous driving by integrating BEV images and textual prompts, enabling simultaneous next-action prediction and uncertainty estimation under occlusion-rich conditions.
Abstract: In this work, we propose a novel framework for uncertainty prediction in autonomous driving using VisionLLM. Leveraging driving data collected from the CARLA simulator, we generate bird’s-eye-view (BEV) images paired with next driving actions and uncertainty scores. To emulate real-world challenges, occlusion masks are introduced to the BEV images, representing regions of limited visibility due to sensor constraints. Our model predicts both the next driving action and uncertainty score, utilizing additional image inputs to enhance its reasoning capability under occlusion-rich conditions. By fine-tuning VisionLLM with Parameter-Efficient Fine-Tuning (PEFT) techniques such as LoRA, we demonstrate the efficacy of our approach in addressing occlusion-based uncertainty, paving the way for safer and more reliable decision-making in high-level driving automation systems.
Submission Number: 40
Loading