Open Scene Graphs for Open-World Object-Goal Navigation

Published: 05 Apr 2024, Last Modified: 30 Apr 2024VLMNM 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robot Navigation, Vision-Language Models, Large Language Models
Abstract: Can we build a system to perform object-goal nav- igation (ObjectNav) in the open world? Advances in foundation models point toward this possibility: Large Language Models (LLMs) are strong semantic reasoners, and robotic foundation models generalise across environments and embodiments. We propose a zero-shot open-world ObjectNav system built purely by composing foundation models, which we call Explorer. To effectively harness these models’ abilities for planning, localisation and other robot functions, we also need a representation serving as a persistent memory to retain and organise information used by the models. To fill this need we propose the Open Scene Graph (OSG), a rich, structured topo-semantic scene representation, which has a structure that can be dynamically configured to suit different environments. We design the OSG mapper, a module for constructing OSGs, which is built fully from foundation models. To achieve open-world ObjectNav, Explorer brings together the OSG mapper, an LLM-based planner and a General Navigation Model (GNM)-based navigation policy, connecting them with the OSG. We demonstrate that LLM-based planning using the structured information from an OSG allows us to outperform existing LLM ObjectNav approaches by a wide margin. We also show that Explorer is capable of effective object-goal navigation in the real world across different robots and novel instructions.
Submission Number: 33