Car4Cast: A Dataset and Benchmark for LLM-Based Motion Forecasting and Spatial Reasoning in Autonomous Driving
Keywords: Large Language Models, Vision Language Models, Dataset, Motion Forecasting, Autonomous Driving, Spatial Reasoning
TL;DR: A dataset and benchmark casting 3D motion forecasting as a spatial reasoning task for Large Language Models
Abstract: Recent advances in Large Language Models (LLMs) have shown promise in diverse reasoning tasks, yet their ability to perform structured spatial-temporal prediction remains underexplored. To address this, we introduce Car4Cast, a novel dataset and benchmark that casts 3D motion forecasting in autonomous driving as a spatial reasoning task and testbed, involving structured text generation. Car4Cast provides a comprehensive evaluation suite tailored to the unique challenges of language-based motion prediction, including both classical trajectory accuracy and LLM-specific issues, such as output formatting and hallucinations. Our benchmark also supports an optional visual modality, enabling future exploration of vision-language models in spatial reasoning tasks. Car4Cast is conceived to drive progress toward spatially intelligent language models, highlighting the need and providing data and evaluation tools for new methods and training paradigms that effectively bridge this existing gap.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 9444
Loading