CompassNav: Steering From Path Imitation to Decision Understanding In Navigation

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Embodied AI, Goal-Driven Navigation, Large Vision-Language Models, Reinforcement Fine-Tuning
TL;DR: We shift navigation from "Path Imitation" to "Decision Understanding." With a novel, densely-annotated dataset and a gap-aware reward, our 7B agent learns to evaluate all moves, achieving SOTA results that surpass even larger models.
Abstract: The dominant paradigm for training Large Vision-Language Models (LVLMs) in navigation relies on imitating expert trajectories. This approach reduces the complex navigation task to a sequence-to-sequence replication of a single correct path, fundamentally limiting the agent's ability to explore and generalize. In this work, we argue for and introduce a new paradigm: a shift from Path Imitation to Decision Understanding. The goal of this paradigm is to build agents that do not just follow, but truly understand how to navigate. We materialize this through two core contributions: first, we introduce Compass-Data-22k, a novel 22k-trajectory dataset.Its Reinforcement Fine-Tuning (RFT) subset provides a panoramic view of the decision landscape by annotating all feasible actions with A* geodesic distances. Second, we design a novel gap-aware hybrid reward function that dynamically adapts its feedback to decision certainty, shifting between decisive signals for optimal actions and nuanced scores to encourage exploration. Integrated into an SFT-then-RFT recipe, our CompassNav agent is trained not to memorize static routes, but to develop an internal 'compass' that constantly intuits the direction to the goal by evaluating the relative quality of all possible moves. This approach enables our 7B agent to set a new state-of-the-art on navigation benchmarks, outperforming even larger proprietary models, and achieve robust real-world goal navigation on a physical robot.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 3064
Loading