The Escalator Problem: Identifying Implicit Motion Blindness in AI for Accessibility

Xiantao Zhang

The Escalator Problem: Identifying Implicit Motion Blindness in AI for Accessibility

Xiantao Zhang

Published: 28 Aug 2025, Last Modified: 28 Aug 2025CV4A11yEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Implicit Motion Blindness, MLLMs, Accessibility, Human-Centered AI, Position Paper

TL;DR: This position paper highlights "Implicit Motion Blindness"—MLLMs' inability to detect subtle motion—as a key flaw in video understanding, undermining user trust. We call for a shift from semantic recognition to physical perception.

Abstract: Multimodal Large Language Models (MLLMs) hold immense promise as assistive technologies for the blind and visually impaired (BVI) community. However, we identify a critical failure mode that undermines their trustworthiness in real-world applications. We introduce the ***Escalator Problem***---the inability of state-of-the-art models to perceive an escalator's direction of travel---as a canonical example of a deeper limitation we term ***Implicit Motion Blindness***. This blindness stems from the dominant frame-sampling paradigm in video understanding, which, by treating videos as discrete sequences of static images, fundamentally struggles to perceive continuous, low-signal motion. As a position paper, our contribution is not a new model but rather to: (I) formally articulate this blind spot, (II) analyze its implications for user trust, and (III) issue a call to action. We advocate for a paradigm shift from purely semantic recognition towards robust physical perception and urge the development of new, human-centered benchmarks that prioritize safety, reliability, and the genuine needs of users in dynamic environments.

Submission Number: 2

Loading