Abstract: Deep neural networks (DNNs) have revolutionized video action recognition, but their increasing use in critical applications also makes them attractive targets for attacks. In particular, backdoor attacks have emerged as a potent threat, enabling attackers to manipulate a DNN's output by injecting a trigger, without affecting the model's performance on clean data. While the effectiveness of backdoor attacks on image recognition is well-known, their impact on video action recognition is not yet fully understood. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. Contrary to prior works that studied clean label backdoor attacks against video action recognition and found them ineffective, our paper investigates the efficacy of poisoned label backdoor attacks against video action recognition and demonstrates their effectiveness. We show that existing poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically. Furthermore, we explore real-world video backdoors to highlight the seriousness of this vulnerability. Finally, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate. Our results highlight the urgent need for developing robust defenses against backdoor attacks on DNNs for video action recognition.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Sanghyun_Hong1
Submission Number: 1582
Loading