A Closer Look at Robustness of Vision Transformers to Backdoor Attacks

Published: 01 Jan 2024, Last Modified: 27 Sept 2024WACV 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Transformer architectures are based on self-attention mechanism that processes images as a sequence of patches. As their design is quite different compared to CNNs, it is important to take a closer look at their vulnerability to back-door attacks and how different transformer architectures affect robustness. Backdoor attacks happen when an attacker poisons a small part of the training images with a specific trigger or backdoor which will be activated later. The model performance is good on clean test images, but the attacker can manipulate the decision of the model by showing the trigger on an image at test time. In this paper, we compare state-of-the-art architectures through the lens of backdoor attacks, specifically how attention mechanisms affect robustness. We observe that the well known vision transformer architecture (ViT) is the least robust architecture and ResMLP, which belongs to a class called Feed Forward Networks (FFN), is most robust to backdoor attacks among state-of-the-art architectures. We also find an intriguing difference between transformers and CNNs - interpretation algorithms effectively highlight the trigger on test images for transformers but not for CNNs. Based on this observation, we find that a test-time image blocking defense reduces the attack success rate by a large margin for transformers. We also show that such blocking mechanisms can be incorporated during the training process to improve robustness even further. We believe our experimental findings will encourage the community to understand the building block components in developing novel architectures robust to back-door attacks. Code is available here: https://github.com/UCDvision/backdoor_transformer.git
Loading