Keywords: LLM spatial reasoning, zero-shot, robot navigation
Abstract: We introduce **LLM-Navi**, a novel large language model-based (LLMs) framework for autonomous navigation in dynamic and cluttered environments. Unlike prior studies constraining LLMs to simplistic, static settings with limited movement options, LLM-Navi enables robust spatial reasoning in realistic, multi-agent scenarios, achieved by uniformly encoding the environments (e.g., real-world floorplans), dynamic agents, and their trajectories as *tokens*. In doing so, we unlock the zero-shot spatial reasoning capabilities inherent in LLMs without requiring retraining or fine-tuning. LLM-Navi supports multi-agent coordination, dynamic obstacle avoidance, and closed-loop replanning, demonstrating generalization across diverse agents, tasks, and environments through text-based interactions. Our experiments show that LLMs can autonomously generate collision-free trajectories, adapt to dynamic changes, and resolve multi-agent conflicts in real time. We extend this framework to humanoid motion generation, showcasing its potential for real-world applications in robotics and human-robot interaction. This work thus establishes a first foundation for integrating LLMs into embodied spatial reasoning tasks, offering a scalable and semantically grounded alternative to traditional methods.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 12121
Loading