Abstract: Deep Neural Networks (DNNs) have been proven to be vulnerable to poisoning attacks that poison the training data with a trigger pattern and thus manipulate the trained model to misclassify data instances. In this article, we study the poisoning attacks on video recognition models. We reveal the major limitations of the state-of-the-art poisoning attacks on <i>stealthiness</i> and <i>attack effectiveness</i> : (i) the frame-by-frame poisoning trigger may cause temporal inconsistency among the video frames which can be leveraged to easily detect the attack; (ii) the feature collision-based method for crafting poisoned videos could lack both generalization and transferability. To address these limitations, we propose a novel stealthy and efficient poisoning attack framework which has the following advantages: (i) we design a 3D poisoning trigger as natural-like textures, which can maintain temporal consistency and human-imperceptibility; (ii) we formulate an ensemble attack oracle as the optimization objective to craft poisoned videos, which could construct convex polytope-like adversarial subspaces in the feature space and thus gain more generalization; (iii) our poisoning attack can be readily extended to the black-box setting with good transferability. We have experimentally validated the effectiveness of our attack (e.g., up to <inline-formula><tex-math notation="LaTeX">$95\%$</tex-math></inline-formula> success rates with only less than <inline-formula><tex-math notation="LaTeX">$\sim 0.5\%$</tex-math></inline-formula> poisoned dataset).
0 Replies
Loading