The One RING: a Robotic Indoor Navigation Generalist

Published: 09 Jun 2025, Last Modified: 09 Jun 2025Robo-3Dvlm OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Visual Indoor Navigation, Cross-Embodiment Policy
Abstract: Modern robots vary significantly in shape, size, and sensor configurations used to perceive and interact with their environments. However, most navigation policies are embodiment-specific—a policy trained on one robot typically fails to generalize to another, even with minor changes in body size or camera viewpoint. As custom hardware becomes increasingly common, there is a growing need for a single policy that generalizes across embodiments, eliminating the need to (re-)train for each specific robot. In this paper, we introduce RING (Robotic Indoor Navigation Generalist), an embodiment-agnostic policy that turns any mobile robot into an effective indoor semantic navigator. Trained entirely in simulation, RING leverages large-scale randomization over robot embodiments to enable robust generalization to many real-world platforms. To support this, we augment the AI2-THOR simulator to instantiate robots with controllable configurations, varying in body size, rotation pivot point, and camera parameters. On the visual object-goal navigation task, RING achieves strong cross-embodiment (XE) generalization—72.1\% average success rate across $5$ simulated embodiments (a 16.7\% absolute improvement on the Chores-S benchmark) and 78.9\% across 4 real-world platforms, including Stretch RE-1, LoCoBot, and Unitree Go1—matching or even surpassing embodiment-specific policies. We further deploy RING on the RB-Y1 wheeled humanoid in a real-world kitchen environment, showcasing its out-of-the-box potential for mobile manipulation platforms.
Submission Number: 4
Loading