Keywords: VLN, embodied dialog
Abstract: For embodied agents capable of physical interaction, dialog capability is crucial to ensure both safety and effectiveness.
While DialNav provides a framework for holistic evaluation of the dialog--execution loop in photorealistic indoor navigation, its performance is constrained.
In this work, we introduce holistic advances spanning data and training.
First, we develop a large-scale dialog generation pipeline to enhance coverage and diversity. Second, we propose task-aligned training for the Navigator to better reflect the dynamic dialog–navigation loop.
Finally, we address the bottleneck of localization with a stronger graph-aware transformer model.
Together, these advances more than double success rates over prior baselines, achieving 58.24% SR on Val Seen and 29.05% on Val Unseen, establishing a new state of the art in dialog-driven embodied navigation.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 9986
Loading