Abstract: Deep neural networks (DNNs) have been recently found vulnerable to adversarial examples. Several previous works attempt to relate the low-frequency or high-frequency parts of adversarial inputs with the robustness of models. However, these studies lack comprehensive experiments and thorough analyses and even yield contradictory results. This work comprehensively explores the connection between the robustness of models and properties of adversarial perturbations in the frequency domain using six classic attack methods and three representative datasets. We visualize the distribution of successful adversarial perturbations using Discrete Fourier Transform and test the effectiveness of different frequency bands of perturbations on reducing the accuracy of classifiers through a proposed quantitative analysis. Experimental results show that the characteristics of successful adversarial perturbations in the frequency domain can vary from dataset to dataset, while their intensities are greater in the effective frequency bands. We analyze the obtained phenomena by combining principles of attacks and properties of datasets and offer a complete view of adversarial examples from the frequency domain perspective, which helps to explain the contradictory parts of previous works and provides insights for future research.
0 Replies
Loading