Attacking LLM-based Robot Intelligence for Long-horizon Tasks

Published: 20 Jun 2025, Last Modified: 20 Jun 2025RSS 2025 Workshop ReliableRoboticsEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robot Planning, Robot Learning: Foundation Models, Assistive, Entertainment and Service Robots
TL;DR: We present Robo-Troj, a backdoor attack on LLM-based task planners in robotics. This attack uses trigger words to activate malicious behaviors, revealing vulnerabilities in LLM-based planning and showcasing the need for stronger security in robotics.
Abstract: Abstract—Robots need task planning methods to achieve goals that require more than one action. Recently, large language models (LLMs) have demonstrated impressive performance in task planning. LLMs can generate a step-by-step solution using a description of actions and the goal. Despite the successes of LLMs in long-horizon tasks for robot intelligence, there is little research studying the security aspects of those systems. In this paper, we develop Robo-Troj, the first backdoor attack specifically designed for LLM-assisted robot planners. Our attack follows the standard practice of LLM usage in robotics where the backbone LLM is typically frozen and hosted in a central server limiting attacker’s reach. In contrast, our attack injects backdoor at the fine-tuning stage using a small set of task-specific parameters for each specific robot. In addition, we develop an optimization method for selecting multiple-trigger words that are most effective for different robot applications. For instance, one can use unique trigger words, e.g., “herical”, to activate a specific malicious behavior, e.g., cutting hand on a kitchen robot. Through demonstrating the vulnerability of current LLM-based planners, we aim to advance secured robot intelligence.
Submission Number: 7
Loading