LLM-Navi: Navigating Motion Agents in Dynamic and Cluttered Environments through LLM Reasoning

We introduce LLM-Navi, a novel large language model-based (LLMs) framework for autonomous navigation in dynamic and cluttered environments. Unlike prior studies constraining LLMs to simplistic, static settings with limited movement options, LLM-Navi enables robust spatial reasoning in realistic, multi-agent scenarios, achieved by uniformly encoding the environments (e.g., real-world floorplans), dynamic agents, and their trajectories as tokens. In doing so, we unlock the zero-shot spatial reasoning capabilities inherent in LLMs without requiring retraining or fine-tuning. LLM-Navi supports multi-agent coordination, dynamic obstacle avoidance, and closed-loop replanning, demonstrating generalization across diverse agents, tasks, and environments through text-based interactions. Our experiments show that LLMs can autonomously generate collision-free trajectories, adapt to dynamic changes, and resolve multi-agent conflicts in real time. We extend this framework to humanoid motion generation, showcasing its potential for real-world applications in robotics and human-robot interaction. This work thus establishes a first foundation for integrating LLMs into embodied spatial reasoning tasks, offering a scalable and semantically grounded alternative to traditional methods.


Overview


Qualitative Results



Two agents playing hide and seek


"You are now a hider,

find somewhere to hide."

"Seeker designated! The hider may be anywhere.

Find the hider!"


Dynamic Resolving

Navigate the path for three agents: agent 1 from (1,1) to (8,8); agent 2 from (1,8) to (8,1); agent 3 from (4,8) to (4,1).

Original trajectory

Updated trajectory after resloving by LLM


Agent1 walks from (2,3) to (17,8); agent2 walks from (4,8) to (3,17); agent3 walks from (17,8) to (4,1); agent4 walks from (16,15) to (6,2); agent5 walks from (5,20) to (1,7).

Top-down view

Generated motion


More Examples

A person runs out of the room that is on fire from (12,1) to (11,7).

Top-down view

Generated motion


Agent1 walks sideways from (1,1) to (18,8); agent2 walks with hands in front from (2,18) to (9,19); agent3 walks from (15,2) to (13,20).

Top-down view

Generated motion


Agent walks from (15,8) to (15,24).

Top-down view

Generated motion


Agent1 walks from (2,2) to (11,9); agent2 walks from (2,9) to (11,2); agent3 walks from (13,2) to (2,9).

Top-down view

Generated motion


Extension to 2.5D


Agent walks up the stairs from (1,4,0) to (5,4,1).