Lang2LTL-2: Grounding Spatiotemporal Navigation Commands Using Large Language and Vision-Language Models

Jason Xinyu Liu; Ankit Shah; George Konidaris; Stefanie Tellex; David Paulius

Lang2LTL-2: Grounding Spatiotemporal Navigation Commands Using Large Language and Vision-Language Models

Jason Xinyu Liu, Ankit Shah, George Konidaris, Stefanie Tellex, David Paulius

Published: 07 May 2025, Last Modified: 07 May 2025ICRA Workshop Human-Centered Robot LearningEveryoneRevisionsBibTeXCC BY 4.0

Workshop Statement: This paper is related to the topic of human-centered robot learning because the proposed work leverages foundation models, specifically pretrained and efficiently finetuned large language models and a vision-language model, to ground spatiotemporal navigation commands for mobile robots in novel environments.

Keywords: foundation models, embodied AI, robotics, language grounding

TL;DR: This work leverages foundation models, specifically pretrained and efficiently finetuned large language models and a vision-language model, to ground spatiotemporal navigation commands for mobile robots in novel environments.

Abstract: Grounding spatiotemporal navigation commands to structured task specifications enables autonomous robots to understand a broad range of natural language and solve long-horizon tasks with safety guarantees. Prior works mostly focus on grounding spatial or temporally extended language for robots. We propose Lang2LTL-2, a modular system that leverages pretrained large language and vision-language models and multimodal semantic information to ground spatiotemporal navigation commands in novel city-scaled environments without retraining. Lang2LTL-2 achieves 93.53\% language grounding accuracy on a dataset of 21,780 semantically diverse natural language commands in unseen environments. We run an ablation study to validate the need for different modalities. We also show that a physical robot equipped with the same system with- out modification can execute 50 semantically diverse natural language commands in both indoor and outdoor environments.

Supplementary Material: pdf

Submission Number: 25

Loading