Lang2LTL-2: Grounding Spatiotemporal Navigation Commands Using Large Language and Vision-Language Models

Published: 07 May 2025, Last Modified: 07 May 2025ICRA Workshop Human-Centered Robot LearningEveryoneRevisionsBibTeXCC BY 4.0
Workshop Statement: This paper is related to the topic of human-centered robot learning because the proposed work leverages foundation models, specifically pretrained and efficiently finetuned large language models and a vision-language model, to ground spatiotemporal navigation commands for mobile robots in novel environments.
Keywords: foundation models, embodied AI, robotics, language grounding
TL;DR: This work leverages foundation models, specifically pretrained and efficiently finetuned large language models and a vision-language model, to ground spatiotemporal navigation commands for mobile robots in novel environments.
Abstract: Grounding spatiotemporal navigation commands to structured task specifications enables autonomous robots to understand a broad range of natural language and solve long-horizon tasks with safety guarantees. Prior works mostly focus on grounding spatial or temporally extended language for robots. We propose Lang2LTL-2, a modular system that leverages pretrained large language and vision-language models and multimodal semantic information to ground spatiotemporal navigation commands in novel city-scaled environments without retraining. Lang2LTL-2 achieves 93.53\% language grounding accuracy on a dataset of 21,780 semantically diverse natural language commands in unseen environments. We run an ablation study to validate the need for different modalities. We also show that a physical robot equipped with the same system with- out modification can execute 50 semantically diverse natural language commands in both indoor and outdoor environments.
Supplementary Material: pdf
Submission Number: 25
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview