Large Language Model Framework for Intuitive Interaction with Autonomous Mobile Robots

Frederike Durow, Rasmus Gjerlund, Tsampikos Kounalakis, Leon Bodenhagen

Published: 01 Jan 2025, Last Modified: 12 Nov 2025Social RoboticsEveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This study explores using Large Language Models (LLMs) for voice-controlled robotics to improve human-robot interaction (HRI) and make it more accessible to non-experts. Implemented on Boston Dynamics’ Spot robot, the framework uses state-of-the-art wake word detection, speech-to-text, and text-to-speech systems within a behavior tree architecture. Results show successful task execution and command comprehension, despite challenges with speech-to-text accuracy and LLM reliability. Experiments demonstrate the framework’s potential for handling simple to moderately complex commands, highlighting the need for improved speech recognition and safety mechanisms. This project underscores the promise of LLMs in robotics and HRI, suggesting future directions for more advanced, multimodal interaction frameworks.

External IDs:doi:10.1007/978-981-96-3522-1_4