Adversarial Attack using Sparse Representation of Feature Maps

Maham Jahangir, Faisal Shafait

20 Nov 2022OpenReview Archive Direct UploadReaders: Everyone

Abstract: Deep neural networks can be fooled by small imperceptible perturbations called adversarial examples. Although these examples are carefully crafted, they involve two major concerns. In some cases, adversarial examples generated are much larger than minimal adversarial perturbations while in others the attack method involves an extensive number of iterations making it infeasible. Moreover, the sparse attacks are either too complex or are not sparse enough to achieve imperceptibility. Therefore, attacks designed should be fast and minimum in terms of ℓ2-norm. In this research, we used a dictionary learning technique to generate sparse adversarial examples based on feature maps of target images. We present two novel algorithms to tune the dictionary learning process and feature map selection. The results on MNIST and Imagenet show our attack is better or competitive with the state-of-the-art methods. We also compared our method with sparse attacks recently introduced in literature. As a result, we have achieved comparable attack success rate when compared to the state-of-the-art with smaller ℓ2-norm. We also tested the efficacy of our attack in the presence of defense mechanisms and none of the defenses were able to combat the effect of our proposed attack

0 Replies