Car4Cast: A Dataset and Benchmark for LLM-Based Motion Forecasting and Spatial Reasoning in Autonomous Driving

Nicholas Argenziano; Lorenz K Muller; Zilong Deng; Linying Yao; Li Fan

Car4Cast: A Dataset and Benchmark for LLM-Based Motion Forecasting and Spatial Reasoning in Autonomous Driving

Nicholas Argenziano, Lorenz K Muller, Zilong Deng, Linying Yao, Li Fan

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Vision Language Models, Dataset, Motion Forecasting, Autonomous Driving, Spatial Reasoning

TL;DR: A dataset and benchmark casting 3D motion forecasting as a spatial reasoning task for Large Language Models

Abstract: Recent advances in Large Language Models (LLMs) have shown promise in diverse reasoning tasks, yet their ability to perform structured spatial-temporal prediction remains underexplored. To address this, we introduce Car4Cast, a novel dataset and benchmark that casts 3D motion forecasting in autonomous driving as a spatial reasoning task and testbed, involving structured text generation. Car4Cast provides a comprehensive evaluation suite tailored to the unique challenges of language-based motion prediction, including both classical trajectory accuracy and LLM-specific issues, such as output formatting and hallucinations. Our benchmark also supports an optional visual modality, enabling future exploration of vision-language models in spatial reasoning tasks. Car4Cast is conceived to drive progress toward spatially intelligent language models, highlighting the need and providing data and evaluation tools for new methods and training paradigms that effectively bridge this existing gap.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 9444

Loading