Abstract: Nowadays, vision transformers (ViTs) are one of the most prominent state-of-the-art models for vision-based tasks. ViTs are being used widely in many safety-critical applications ranging from health care to automotive. However, these widespread deployments also bring the risk of adversarial attacks on ViTs to the forefront. Thus, understanding the vulnerability of vision transforms against possible adversarial attacks is necessary before deployment in real-time scenarios. Adversarial patch attacks represent a practical threat to the viability of ViT-based real-world applications. This study delves into the vulnerability of vision transformers to such attacks, exploring both single and multi-patch adversarial attacks to gauge the robustness of vision transformers across benchmark datasets, including CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet-1k. Experimentally, our findings reveal that poly-multi patch attacks constitute formidable adversarial threats, with vision transformers exhibiting greater vulnerability to poly-multi attacks than single, mono, and split-multi attacks. Additionally, we qualitatively elucidate the impact of patch location on the efficacy of adversarial attacks, providing insights into the factors influencing their effectiveness. Through this study, we aim to enhance our understanding of vision transformers' susceptibility to adversarial patch attacks, contributing to developing strategies for strengthening their security and resilience in real-world applications.
Loading