AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans

ACM SGA 2025 Workshop TriFusion Submission3 Authors

12 Sept 2025 (modified: 16 Sept 2025)ACM SGA 2025 Workshop TriFusion SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Visual language navigation, embodied AI, agent, dynamic obstacles

Abstract: Visual Language Navigation (VLN) is a task that challenges robots to navigate in realistic environments based on natural language instructions. While previous research has largely focused on static settings, real-world navigation must often contend with dynamic human obstacles. Hence, we propose an extension to the task, termed Adaptive Visual Language Navigation (AdaVLN), which seeks to narrow this gap. AdaVLN requires robots to navigate complex 3D indoor environments populated with dynamically moving human obstacles, increasing task complexity and realism. To support exploration of this task, we also present AdaVLN simulator and AdaR2R datasets. The AdaVLN simulator enables easy inclusion of fully animated human models directly into common datasets like Matterport3D. We also introduce a "freeze-time" mechanism for both the navigation task and simulator, which pauses world state updates during agent inference, enabling fair comparisons and experimental reproducibility across different hardware. We benchmark several baseline models in simulation and real environments, analyze the unique challenges of AdaVLN, and show its potential to narrow the sim-to-real gap in VLN research.

Submission Number: 3

Loading