Keywords: scene graphs, human-robot interaction, visualization, agentic AI, large language models
TL;DR: This paper introduces SceneChat, a system that uses tool-calling LLM agents to let non-expert users naturally query and control robots through 3D scene graphs, improving human–machine teaming in public safety scenarios.
Abstract: The development of real-time hierarchical 3D scene graphs allows a machine to quickly build a model of its environment. This model is machine-readable and designed for inference efficiency from a robotics perspective, but lacks a well-designed interface for non-expert users to team with the machine. We introduce a system structure that relies on large language model (LLM) tool agents to interact with, query, and update both the underlying world model and to guide the machine's interactions with the environment. Through two user scenarios derived from discussions with emergency incident responders, we analyze the capabilities of our visual and language interface. Finally, we give recommendations on what information should be collected or derived from the 3D scene graphs to support human-machine teaming (HMT), and the successes and limitations of the current iteration of LLMs to support user interaction.
Submission Number: 4
Loading