Keywords: Backdoor Attacks, Vision-Language-Action Models, Embodied AI
Abstract: Vision-Language-Action (VLA) models have emerged as a popular method for general-purpose embodied AI, enabling robots to interpret multimodal inputs and generate temporally coherent actions. Popular imitation learning methods, including diffusion-based and autoregressive approaches, typically rely on human-collected demonstrations, which often contain small execution errors such as pauses or irregular motions even when consisting only of successful trajectories. Because decision-making in robotics is sequential, even small errors can compound over time, eventually leading to task failure. In this work, we exploit this property to introduce a new class of clean-action backdoor attacks, which require only partial poisoning of demonstration trajectories while preserving overall rollouts and apparent task success. Unlike conventional backdoors, our approach is more difficult to detect, since it conceals malicious behaviors within natural error patterns rather than obvious trajectory alterations. We validate our method by backdooring the $\pi_0$ model and testing on the LIBERO benchmark, where it achieves consistently high attack success rates while evading standard detection and remaining effective under clean-data fine-tuning. These findings highlight the urgent need for VLA-specific defenses that address sequential vulnerabilities in embodied AI systems.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 25090
Loading