Demo Abstract: Embodied Aerial Agent for City-level Visual Language Navigation Using Large Language Model

Published: 2024, Last Modified: 07 Oct 2025IPSN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: As unmanned aerial vehicles (UAVs) become more prevalent in smart cities, their capacity for visual language navigation (VLN) is garnering increasing interest. VLN in cities has significant applications in delivery, rescue, and security patrol, among other fields. One of the most representative tasks is to navigate to specific locations following the language instructions. While some current methods have achieved notable results in indoor settings, challenges persist outdoors, including agents’ inaccurate spatial understanding and ambiguous language instructions. In this work, we explore an embodied navigation agent design, in which a fine-grained spatial verbalizer and a history path memory are proposed to guarantee accurate VLN in open 3D urban environments.
Loading