Keywords: Human Robot Interaction, Virtual Reality, Shared Control, Teleoperation, Telepresence, LLM
TL;DR: Integration of LLMs+vision foundational model leads to better outcomes in shared control based teleoperation.
Abstract: Teleoperation in robotic systems encompasses three primary modes of control: full teleoperation, shared control, and autonomous operation. Full teleoperation allows human operators to have complete control over the robot, enabling real-time manipulation and decision making. Shared control, a hybrid approach, integrates elements of both teleoperation and autonomous control, permitting human intervention in specific scenarios while maintaining a degree of autonomous functionality. Autonomous operation relies entirely on the robot's decision-making algorithms to perform tasks without human input. Although shared control has proven effective in static environments, recent studies indicate that its benefits diminish in dynamic settings due to the increased cognitive load on the human operator and the frequent need to switch between modes. The advent of multimodal large language models (LLMs) such as GPT-4 and Gemini has significantly advanced visual scene understanding and language-based reasoning. These capabilities can enhance shared control systems by allowing operators to act as global planners and provide natural language commands, reducing the need for constant switching. This paper proposes a novel approach that combines language-driven machine learning models with shared control frameworks to improve human-robot interaction in both static and dynamic environments. We develop a language-model-guided shared control mechanism and evaluate its performance across various settings. Results from both qualitative feedback and quantitative metrics demonstrate that our LLM-based shared controller successfully reduces operator cognitive burden while improving overall task performance.
Submission Number: 2
Loading