Unblocking the Path: VLM-Assisted Robot Navigation in Indoor Environments

Published: 08 Apr 2026, Last Modified: 08 Apr 2026CVPR 2026 Workshop WDFM-EAI PosterEveryoneRevisionsCC BY 4.0
Keywords: Autonomous navigation, robotics, LVM-navigation reasoning, embodied AI
TL;DR: Our system integrates geometric planning and vision-language models so robots can identify, localize, and interact with objects that block progress during indoor navigation.
Abstract: Autonomous navigation is a fundamental capability for mobile robots, yet traditional methods largely treat the robot as a passive agent that follows preplanned paths while avoiding obstacles. Such approaches are effective in structured environments but fall short in human-centric indoor spaces where progress often requires active interaction, such as opening doors or using elevators. Large foundation models, and in particular vision-language models (VLMs), offer a new opportunity to address these challenges by combining scene understanding with high-level reasoning. In this work, we develop a navigation system that integrates classical geometric planning with VLM-based reasoning to enable robots to actively resolve situations where passive path following would fail. When blocked, the robot queries a VLM with sensory inputs and task context to determine what is preventing progress and how to overcome it. To ensure successful localization of interaction objects, such as door buttons, we couple the VLM with an open-vocabulary detector that grounds language-based reasoning into concrete visual cues. We implement this system on a real robot and evaluate it through proof-of-concept experiments, demonstrating the potential of VLM-assisted navigation for unblocking tasks in complex indoor environments.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 2
Loading