OpenFrontier: General Navigation with Visual-Language Grounded Frontiers

Boyang Sun; Cesar Cadena; Marc Pollefeys; Hermann Blum

OpenFrontier: General Navigation with Visual-Language Grounded Frontiers

Boyang Sun, Cesar Cadena, Marc Pollefeys, Hermann Blum

Published: 06 Oct 2025, Last Modified: 10 Oct 2025OWN 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-based Navigation, Deep Learning

Abstract: Open-world navigation requires robots to make decisions in complex, dynamic environments and adapt to flexible task requirements. Traditional approaches often rely on hand-crafted goal metrics and struggle to generalize beyond specific tasks. Recent advances in vision-language-action (VLA) models enable end-to-end policies conditioned on natural language, but they typically require interactive training or large-scale data collection with a mobile agent. We frame navigation as a discrete sub-goal identification problem and extend our previous work, FrontierNet—a learning-based exploration system that detects and localizes frontiers directly from visual cues. We integrate FrontierNet with pre-trained vision-language models (VLMs) through a set-of-mark prompting strategy, enabling direct zero-shot, general-purpose navigation from natural language instructions. FrontierNet achieves state-of-the-art performance in autonomous exploration, and when combined with a VLM, demonstrates zero-shot adaptation across a variety of semantic tasks, such as object search—without requiring any additional training or map updating.

Submission Number: 7

Loading