OmniPredict: GPT-4o Enhanced Multi-modal Pedestrian Crossing Intention Prediction

Published: 10 Oct 2024, Last Modified: 19 Nov 2024AFM 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Pedestrian Crossing Intention Prediction; MLLM; Autonomous Driving
TL;DR: OmniPredict
Abstract: Pedestrian crossing intention prediction is a crucial component for ensuring safety and responsible navigation in urban autonomous driving systems. Traditional methods, which have relied on vision-based models, struggle to generalize to unseen driving scenarios due to their dependence on training data. Multimodal Large Language Models (MLLMs) offer a novel approach to these challenges through their advanced reasoning capabilities. In this paper, we introduce OmniPredict, the first study to evaluate GPT-4o(mni), a cutting-edge MLLM, for predicting pedestrian crossing intentions. Using the JAAD dataset, our model achieved 67% prediction accuracy in a zero-shot setting, outperforming the performance of existing state-of-the-art MLLM methods by 17.5% without the need for additional data or retraining. By integrating diverse contextual modalities and finely tuned prompts, our approach enhances prediction reliability and reduces uncertainty. This demonstrates that our method contributes to improving prediction performance, thereby advancing safer driving environments.
Submission Number: 28
Loading