OWLS: Open-World Language-driven Servoing for Zero-Shot Traversability
Keywords: Robot Navigation, Traversability, Vision Language Models
Abstract: Traversable path selection by an autonomous robot requires reasoning over the geometry and semantics of an environment. Scene geometry provides information about occupancy to enable collision avoidance, while scene semantics provide information about physical affordances such as how much weight a surface can bear. In this work, we propose jointly optimizing the robot's trajectory over semantically rich features from vision language models (VLMs) together with scene geometry estimates in a modified approach to visual servoing. We select semantically informed navigation actions to a given goal location directly from camera-space optimization. We perform indoor and outdoor navigation experiments on a wheeled rover to select traversable paths. We find that VLM-based visual servoing allows for light-weight semantically-informed navigation without task specific model training, access to prior knowledge of the scene, or requiring retention of a globally consistent map. These findings support viability of this methodological direction for robotic autonomy in remote environments such as space.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 15
Loading