mmWalk: Towards Multi-modal Multi-view Walking Assistance

Kedi Ying; Ruiping Liu; Chongyan Chen; Mingzhe Tao; Hao Shi; Kailun Yang; Jiaming Zhang; Rainer Stiefelhagen

mmWalk: Towards Multi-modal Multi-view Walking Assistance

Kedi Ying, Ruiping Liu, Chongyan Chen, Mingzhe Tao, Hao Shi, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Accessibility for People with Visual Impairments, Walking Assistance, Multimodal Scene Understanding, Multiview Perception

Abstract: Walking assistance in extreme or complex environments remains a significant challenge for people with blindness or low vision (BLV), largely due to the lack of a holistic scene understanding. Motivated by the real-world needs of the BLV community, we build mmWalk, a simulated multi-modal dataset that integrates multi-view sensor and accessibility-oriented features for outdoor safe navigation. Our dataset comprises $120$ manually controlled, scenario-categorized walking trajectories with $62k$ synchronized frames. It contains over $559k$ panoramic images across RGB, depth, and semantic modalities. Furthermore, to emphasize real-world relevance, each trajectory involves outdoor corner cases and accessibility-specific landmarks for BLV users. Additionally, we generate mmWalkVQA, a VQA benchmark with over $69k$ visual question-answer triplets across $9$ categories tailored for safe and informed walking assistance. We evaluate state-of-the-art Vision-Language Models (VLMs) using zero- and few-shot settings and found they struggle with our risk assessment and navigational tasks. We validate our mmWalk-finetuned model on real-world datasets and show the effectiveness of our dataset for advancing multi-modal walking assistance.

Croissant File: json

Dataset URL: https://doi.org/10.7910/DVN/KKDXDK

Code URL: https://github.com/KediYing/mmWalk.git

Primary Area: Datasets & Benchmarks for applications in computer vision

Submission Number: 345

Loading