Abstract: Vision Transformers (ViTs), which made a splash in the
field of computer vision (CV), have shaken the dominance
of convolutional neural networks (CNNs). However, in
the process of industrializing ViTs, backdoor attacks have
brought severe challenges to security. The success of ViTs
benefits from the self-attention mechanism. However, compared with CNNs, we find that this mechanism of capturing
global information within patches makes ViTs more sensitive to patch-wise triggers. Under such observations, we
delicately design a novel backdoor attack framework for
ViTs, dubbed BadViT, which utilizes a universal patch-wise
trigger to catch the model’s attention from patches beneficial for classification to those with triggers, thereby manipulating the mechanism on which ViTs survive to confuse
itself. Furthermore, we propose invisible variants of BadViT
to increase the stealth of the attack by limiting the strength
of the trigger perturbation. Through a large number of experiments, it is proved that BadViT is an efficient backdoor
attack method against ViTs, which is less dependent on the
number of poisons, with satisfactory convergence, and is
transferable for downstream tasks. Furthermore, the risks
inside of ViTs to backdoor attacks are also explored from
the perspective of existing advanced defense schemes.
Loading