WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We introduce WOMD-Reasoning, the largest language dataset focusing on traffic rule-induced interactions in driving.
Abstract: Language models uncover unprecedented abilities in analyzing driving scenarios, owing to their limitless knowledge accumulated from text-based pre-training. Naturally, they should particularly excel in analyzing rule-based interactions, such as those triggered by traffic laws, which are well documented in texts. However, such interaction analysis remains underexplored due to the lack of dedicated language datasets that address it. Therefore, we propose Waymo Open Motion Dataset-Reasoning (WOMD-Reasoning), a comprehensive large-scale Q&As dataset built on WOMD focusing on describing and reasoning traffic rule-induced interactions in driving scenarios. WOMD-Reasoning also presents by far the largest multi-modal Q&A dataset, with 3 million Q&As on real-world driving scenarios, covering a wide range of driving topics from map descriptions and motion status descriptions to narratives and analyses of agents' interactions, behaviors, and intentions. To showcase the applications of WOMD-Reasoning, we design Motion-LLaVA, a motion-language model fine-tuned on WOMD-Reasoning. Quantitative and qualitative evaluations are performed on WOMD-Reasoning dataset as well as the outputs of Motion-LLaVA, supporting the data quality and wide applications of WOMD-Reasoning, in interaction predictions, traffic rule compliance plannings, etc. The dataset and its vision modal extension are available on https://waymo.com/open/download/. The codes & prompts to build it are available on https://github.com/yhli123/WOMD-Reasoning.
Lay Summary: Understanding how vehicles interact on the road - especially when traffic rules are involved - is key in building safe and intelligent driving systems. Today’s language-assisted driving models still struggle with analyzing these situations because there are few training data which can capture the traffic-rule-based interactions. To fix this, we create WOMD-Reasoning, the largest-ever dataset of driving questions and answers built on real-world traffic data. It includes 3 million Q&A examples that describe maps, vehicle movements, and agent interactions, especially those influenced by traffic rules. Unlike past efforts, our dataset focuses not just on what is happening, but why - providing context and reasoning behind road behaviors. We also introduce Motion-LLaVA, a model trained on WOMD-Reasoning, which can understand and explain driving scenarios in a more rule-aware manner. Thanks to the WOMD-Reasoning dataset, our model demonstrates strong performance in predicting interactions and enabling safer, rule-compliant planning. Our work lays the groundwork for more explainable and regulation-aware AI systems in autonomous driving. All data and tools are freely available to the research community.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/yhli123/WOMD-Reasoning
Primary Area: Applications->Robotics
Keywords: Language Q&A Dataset, Autonomous Driving, Interaction Reasoning, Multi-modal Learning
Submission Number: 8197
Loading