FROM STEERING TO PEDALLING: DO AUTONOMOUS DRIVING VLMS GENERALIZE TO CYCLIST-ASSISTIVE SPATIAL PERCEPTION AND PLANNING?

Published: 02 Mar 2026, Last Modified: 15 Apr 2026ES-Reasoning @ ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Spatial perception, autonomous driving, traffic understanding
TL;DR: We evaluate SOTA for spatial perception and planning from cyclist-perspective, including spatially enhanced vlms
Abstract: Cyclists often encounter safety-critical situations in urban traffic, highlighting the need for assistive systems that support safe and informed decision-making. Recently, vision--language models (VLMs) have demonstrated strong performance on autonomous driving benchmarks, suggesting their potential for general traffic understanding and navigation-related reasoning. However, existing evaluations are predominantly vehicle-centric and fail to assess perception and reasoning from a cyclist-centric viewpoint. To address this gap, we introduce \textit{CyclingVQA}, a diagnostic benchmark designed to probe perception, spatio-temporal understanding, and traffic-rule-to-lane reasoning from a cyclist’s perspective. Evaluating \textbf{31+} recent VLMs spanning general-purpose, spatially enhanced, and autonomous-driving-specialized models, we find that current models demonstrate encouraging capabilities, while also revealing clear areas for improvement in cyclist-centric perception and reasoning, particularly in interpreting cyclist-specific traffic cues and associating signs with the correct navigational lanes. Notably, several driving-specialized models underperform strong generalist VLMs, indicating limited transfer from vehicle-centric training to cyclist-assistive scenarios. Finally, through systematic error analysis, we identify recurring failure modes to guide the development of more effective cyclist-assistive intelligent systems.
Submission Number: 75
Loading