Keywords: adversarial attack, deep learning, adversarial examples, perceptual quality
Abstract: Neural networks are known to be vulnerable to adversarial examples, which are obtained by adding intentionally crafted perturbations to original images. However, these perturbations degrade their perceptual quality and make them more difficult to perceive by humans. In this paper, we propose two separate attack agnostic methods to increase the perceptual quality, measured in terms of perceptual distance metric LPIPS, while preserving the target fooling rate. The first method intensifies the perturbations in the high variance areas in the images. This method could be used in both white-box and black-box settings for any type of adversarial examples with only the computational cost of calculating the pixel based image variance. The second method aims to minimize the perturbations of already generated adversarial examples independent of the attack type. In this method, the distance between benign and adversarial examples are reduced until adversarial examples reach the decision boundaries of the true class. We show that these methods could also be used in conjunction to improve the perceptual quality of adversarial examples and demonstrate the quantitative improvements on CIFAR-10 and NIPS2017 Adversarial Learning Challenge datasets.