The Case for Negative Data: From Crash Reports to Counterfactuals for Reasonable Driving

Jay Patrikar; Apoorva Sharma; Sushant Veer; Boyi Li; Sebastian Scherer; Marco Pavone

The Case for Negative Data: From Crash Reports to Counterfactuals for Reasonable Driving

Jay Patrikar, Apoorva Sharma, Sushant Veer, Boyi Li, Sebastian Scherer, Marco Pavone

Published: 06 Sept 2025, Last Modified: 26 Sept 2025CoRL 2025 Robot Data WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Autonomous Driving, Retrieval-Augmented Reasoning, Robot Safety

Abstract: Learning-based autonomous driving systems are trained mostly on incident-free data, offering little guidance near safety–performance boundaries. Real crash reports contain precisely the contrastive evidence needed, but they are hard to use: narratives are unstructured, third-person, and poorly grounded to sensor views. We address these challenges by normalizing crash narratives to ego-centric language and converting both logs and crashes into a unified scene–action representation suitable for retrieval. At decision time, our system adjudicates proposed actions by retrieving relevant precedents from this unified index; an agentic counterfactual extension proposes plausible alternatives, retrieves for each, and reasons across outcomes before deciding. On a nuScenes benchmark, precedent retrieval substantially improves calibration, with recall on contextually preferred actions rising from 24% to 53%. The counterfactual variant preserves these gains while sharpening decisions near risk.

Lightning Talk Video: mp4

Submission Number: 32

Loading